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ABSTRACT  k 

Unification,  the  fundamental  operation  in^Prolog^  can  take  up  to  50%  of  the 
execution  time  of  a  typical  Prolog  system.  One  approach  to  speeding  up  the 
unification  operation  is  to  perform  it  on  parallel  hardware.  Although  it  has  been 
shown  that,  in  general,  there  is  no  parallel  algorithm  for  unification  that  is  better 
than  the  best  sequential  algorithm,  there  is  a  substantial  subset  of  unifications 
which  may  be  done  in  parallel.  Identifying  these  subsets  involves  gathering  data 
using  an  extension  of  Chang's  static  data-dependency  analysis  (SDDA),  then  using 
that  data  to  schedule  the  components  of  a  unification  for  parallel  unification. 
Improvements  to  the  information  gathered  by  SDDA  may  be  achieved  through 
procedure  splitting,  a  source-level  transformation  of  th*  program.  This  thesis 
describes  and  evaluates  the  above-mentioned  techniques  and  their  implementa¬ 
tion.  Results  are  compared  to  other  techniques  for  speeding  up  unification.  Ways 
in  which  these  techniques  may  be  applied  to  the  Berkeley  PLM  machine  are  also 
described. 
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Parallel  Unification  Scheduling  In  Prolog 

Wayne  Citrin 
Computer  Science  Division 

Department  of  Electrical  Engineering  and  Computer  Science 
University  of  California 
Berkeley,  California 

1.  Introduction 

1.1.  Prolog 

Prolog  is  a  logic  programming  language  based  on  first  order  predicate  cal¬ 
culus  [18]  .  It  is  the  implementation  language  being  used  in  the  Japanese  Fifth 
Generation  Computer  Project  [13]  and  the  Berkeley  PLM  [0]  and  is  used  for  a 
number  of  applications  including  expert  systems  and  theorem  proving.  It  has  also 
been  employed  as  an  implementation  language  for  compilers  [4,25,26]  . 

It  is  expected  that  the  reader  is  familiar  with  Prolog.  For  more  information 
on  the  language,  see  [5]  . 

1.2.  Unification 

Unification  is  the  fundamental  operation  in  Prolog.  It  is  the  method  by 
which  Prolog  variables  are  assigned  values,  and  is  also  a  test  for  equality. 
Unification  is  an  operation  in  which  two  expressions  are  made  identical  by  finding 
substitutions  for  some  or  all  of  the  variables  in  the  expressions.  In  Prolog, 
unification  is  performed  when  a  procedure  is  called,  the  unification  being 
attempted  between  the  calling  sub  goal  and  the  head  of  the  clause  being  called. 
Unification  may  also  be  explicitly  performed  in  Prolog  by  using  the  '=’  operator. 

In  addition  to  controlling  the  assignment  and  comparison  of  values, 
unification  affects  the  flow  of  control  in  Prolog  programs.  Failure  of  a  unification 
may  cause  backtracking  to  occur,  or  may  cause  the  next  candidate  clause  in  a 
program  to  be  tried. 

As  an  example  of  unification,  to  unify  f(X,g(h))  and  f(g(Z),g(Y)),  where  X,  Y, 
and  Z  are  variables,  one  possible  solution  would  substitute  g(Z)  for  X  and  h  for  Y. 
(Note  that  it  is  not  necessary  here  to  substitute  anything  for  Z.)  With  these  sub¬ 
stitutions,  both  terms  would  become  f(g(Z),g(h)).  Note  that  not  all  pairs  of 
expressions  can  be  unified,  nor  is  there  always  a  unique  unifier  (substitution)  for 
two  terms  which  do  unify.  If  two  terms  do  unify,  however,  there  is  a  most  gen¬ 
eral  unifier  (mgu),  which  is  unique  up  to  renaming  of  variables  [22]  .  If  t  and  V 
are  two  terms  to  be  unified,  and  a  is  a  unifier  for  t  and  t '  (so  that  a(f)=s(P)), 
then  a  is  the  most  general  unifier  of  t  and  t 1  if  and  only  if  for  any  unifier  p  of  t 
and  t',  there  exists  a  substitution  q  such  that  p=qOa  (O  is  the  composition 
operation). 


1.3.  Motivation 

In  addition  to  being  the  fundamental  operation  in  Prolog,  unification  of  terms 
consumes  about  half  of  the  execution  time  of  a  typical  Prolog  system.  Woo  [27] 
has  found  that  a  Prolog  program  executed  by  a  UNSW  Prolog  interpreter  running 
on  a  VAX  11/780  typically  spends  55-70%  of  its  total  processing  time  performing 
unifications. 

Even  in  the  much  more  efficient  Berkeley  PLM,  a  similar  amount  of  time  is 
spent  on  unifications.  In  the  PLM,  there  is  a  class  of  instructions,  known  as  get 
and  unify  instructions,  which  perform  unifications.  [See  section  0.1  for  more 
information  on  these  instructions.]  In  a  set  of  six  Prolog  benchmarks,  between 
26%  and  63%  of  the  instructions  executed  by  a  PLM  simulator  were  in  this  class. 
Similarly,  measuring  the  percentage  of  microcycles  spent  executing  these  instruc¬ 
tions  (a  measurement  of  the  absolute  time  spent  by  the  PLM  on  unification)  yields 
comparable,  although  slightly  lower,  results.  [See  table  1.1.] 

Table  1.1 

Unification  in  the  Berkeley  PLM 


Benchmark 

%  Get  +  Unify 

%  Get  +  Unify  Microcycles 

query 

42.16 

46.23 

mu 

26.52 

44.40 

serialize 

63.26 

44.08 

deriv 

60.37 

23.37 

nreverse 

60.38 

51.51 

quicksort 

56.06 

37.74 

1.4.  Objectives 

The  evidence  suggests  that  Prolog  systems  spend  a  large  percentage  of  their 
time  performing  unifications.  This  reflects  the  ubiquitous  nature  of  unification  in 
Prolog  programs  in  general.  Reducing  the  time  spent  on  unification  would  have  a 
significant  effect  on  the  execution  of  Prolog  programs.  Woo  [27]  mentions  two 
approaches  to  reducing  the  total  time  spent  performing  unifications  in  a  Prolog 
program.  The  first  approach  is  to  reduce  the  number  of  unifications  performed, 
either  by  transforming  the  Prolog  program  [14]  or  by  selective  backtracking 
[1,23]  .  The  second  approach,  suggested  by  Woo,  is  to  create  more  efficient 
sequential  unification  implementations,  in  his  case  by  developing  an  efficient 
hardware  unification  unit. 

There  is  a  third  approach,  which  is  to  reduce  the  amount  of  time  spent  per¬ 
forming  unifications  by  performing  at  least  some  of  the  unifications  simultane¬ 
ously.  When  unifying  two  terms,  several  subterms  must  usually  be  unified.  If 
some  or  all  of  these  subterms  could  be  unified  simultaneously  each  time  a 
unification  had  to  be  performed,  a  large  speedup  could  result.  This  approach 
includes  the  solution  proposed  in  this  thesis,  along  with  the  dataflow  scheme  used 
in  the  Japanese  PIM-D  machine  [16]  ,  and  a  technique  for  transforming  some 
unifications  to  term  matching,  a  problem  that  can  be  solved  quickly  in  parallel 
[10]  .  All  of  these  other  techniques  will  be  addressed  in  chapter  8. 


This  dissertation  will  address  the  problem  of  how  to  speed  up  the  unification 
operation  and  thereby  reduce  the  percentage  of  program  execution  time  spent  on 
unification.  The  solution  proposed  is  a  compile  time  technique  in  which  extensive 
preprocessing  of  a  Prolog  program  is  performed  in  order  to  determine  which 
unifications  may  be  scheduled  at  compile  time  for  later  parallel  execution.  Data 
for  making  these  determinations  is  gathered  using  the  static  data-dependency 
analysis  (SDDA)  techniques  originally  developed  by  J-H  Chang  [2]  ,  but  in  order 
to  derive  a  satisfactory  amount  of  parallelism  in  the  unification,  extensive 
refinements  have  been  made.  In  addition,  a  source-level  transformation  technique 
called  procedure  splitting,  which  is  driven  by  SDDA  information,  is  used  to 
increase  the  accuracy  of  this  data  still  further.  The  information  gathered  will 
then  be  used  to  schedule  the  unifications.  In  addition,  a  similar  run  time  tech¬ 
nique  will  be  discussed. 

It  should  be  noted  that  the  three  approaches  mentioned  above  are  not  mutu¬ 
ally  exclusive,  and  particularly  that  some  or  all  of  them  may  be  used  in  conjunc¬ 
tion  with  the  technique  presented  herein,  which  will  be  known  as  “parallel 
unification  scheduling.” 

1.5.  Other  Parallelism 

Prolog  lends  itself  to  other  types  of  parallel  execution  [8]  .  Work  has  been 
done  on  the  parallel  execution  of  subgoals  in  a  clause  [2]  (ANTD-parallelism),  and 
on  the  parallel  execution  of  candidate  clauses  [6,10]  (OR-parallelism).  These 
types  of  parallelism  will  not  be  addressed  here,  but  there  is  no  reason  why  they 
may  not  be  included  in  a  Prolog  implementation  along  with  parallel  unification. 


2.  Unification  -  Theoretical  Results 

The  fastest  known  sequential  algorithm  for  unification  was  found  by  Paterson 
and  Wegman  [22]  .  It  performs  unification  in  time  linear  in  the  lengths  of  the  two 
terms  to  be  unified."' 

Dwork  [11]  has  shown  that  unification  is  log-space  complete  for  FP.f  Yasuura 
[28]  has  reached  the  same  conclusion  by  a  different  method.  This  is  not  an 
encouraging  result.  It  means  that  it  is  unlikely  that  a  parallel  unification  algo¬ 
rithm  could  be  significantly  better  than  the  best  sequential  algorithms.  This  is 
because  if  a  problem  is  log-space  complete  for  FP,  then  if  the  problem  could  be 
solved  very  quickly  in  parallel  (say,  in  polynomial  logarithmic  time  log°W(n)  using 
processors,  a  class  of  problems  known  as  NC),  it  would  imply  that  NC  = 
FP,  that  is  to  say  that  any  problem  solvable  with  a  sequential  polynomial  algo¬ 
rithm  could  be  solved  very  quickly  in  parallel.  This  is  considered  unlikely  [7]  . 
Thus,  it  is  unlikely  that  any  log-space  complete  problem  can  be  solved  much  more 
quickly  in  parallel  than  sequentially. 

It  might  be  expected  that  Prolog  head  unification,  that  is,  unification  of  a  call 
subgoal  and  a  clause  head,  would  be  a  simpler  case  of  the  unification  problem  and 
thus  possibly  solvable  quickly  in  parallel,  since  the  two  terms  being  unified  may 
share  no  common  variables  before  the  unification  begins,  due  to  Prolog  scoping 
rules.  (For  example,  in  general  unification,  if  the  terms  f(X,a)  and  f(b,X)  are  to  be 
unified,  the  two  instances  of  X  are  considered  to  be  the  same  variable,  and  substi¬ 
tutions  for  one  must  also  be  made  for  the  other.  In  Prolog,  however,  if  the  former 
term  is  a  call  subgoal  and  the  latter  term  is  a  clause  head,  Prolog  scoping  rules 
specify  that  the  two  instances  of  X  represent  different  variables.)  It  is  simple,  how¬ 
ever,  to  show  that  Prolog  head  unification  is  also  log-space  complete  by  construc¬ 
tion  a  log-space,  linear  time  reduction  from  general  unification  to  Prolog 
unification: 

Given  two  terms  tl  and  t2  (which  may  have  common  variables)  to  be  unified, 
construct  two  new  terms  a(tl,t2)  and  a(X,X),  where  X  is  a  variable  that  does  not 
occur  in  tl  or  t2.  a(tl,t2)  and  a(X,X)  have  no  common  variables,  so  unifying 
them  would  be  a  Prolog  head-type  unification.  It  is  obvious  that  tl  and  t2  will 
unify  (using  general  unification)  if  and  only  if  a(tl,t2)  and  a(X,X)  unify  (using 
Prolog  head-unification).  This  reduction  can  be  performed  in  constant  space  and 
time.  Since  Prolog  unification  is  a  subset  of  general  unification  (i.e.,  any  correct 
general  unification  algorithm  will  correctly  unify  two  terms  which  do  not  share 
variables)  and  Paterson  and  Wegman  [22]  provide  a  polynomial  (in  this  case, 
linear)  algorithm  for  general  unification,  Prolog  head  unification  is  also  in  FP. 
This  means  that  Prolog  head  unification  is  log-space  complete,  so  it  is  also 

*  Paterson  and  Wegman  provide  a  good  example  of  their  linear  unification  algorithm  in 
operation. 

f  FP  is  the  set  of  functions  computable  in  polynomial  time  on  a  sequential  processor. 

It  is  equivalent  to  the  more  well-known  class  P  of  decision  problems  decidable  in  polyno¬ 
mial  time  on  a  sequential  machine.  A  problem,  A,  is  log-space  complete  for  a  certain 
class  (in  this  case,  FP)  if  every  problem  in  that  class  can  be  transformed  to  A  using  a 
transformation  which  takes  space  0(log(n))  and  time  O(n)  where  n  is  the  size  of  the  prob¬ 
lem  being  transformed. 


unlikely  that  a  parallel  algorithm  to  perform  it  would  be  significantly  better  than 
the  best  sequential  algorithm. 

It  should  be  noted,  however,  that  these  results  indicate  that  there  is  probably 
no  algorithm  which  can  unify  any  pair  of  terms  in  polynomial  logarithmic  time  on 
a  parallel  machine.  However,  we  can  show  that,  although  there  are  many  term 
pairs  that  cannot  be  unified  quickly  on  parallel  hardware,  there  are  many  others 
which  can.  For  example,  in  the  next  chapter,  we  will  show  that  attempting  to 
unify  f(X,X)  and  f(a,b)  is  inherently  sequential,  while  f(X,Y)  and  f(a,b)  may  be 
unified  quickly  on  parallel  hardware.  There  is  evidence  that  most  unifications  in 
Prolog  programs  fall  into  this  category  (section  5.2  and  [3]  ),  so  that  it  is 
profitable  to  unify  Prolog  terms  on  parallel  hardware  where  possible. 


3.  Parallel  Unification  -  An  Overview 

3.1.  Assumptions 

3.1.1.  Language 

The  context  of  the  unification  operation  is  conventional  sequential  Prolog,  for 
example  the  dialect  of  Edinburgh  Prolog  presented  by  Clocksin  and  Mellish  [5]  . 
The  semantics  of  all  operations,  including  unification,  are  assumed  to  be 
unchanged  from  this  sequential  version.  There  are  a  number  of  reasons  for  this. 
First,  if  the  Prolog  being  used  in  this  project  is  identical  to  conventional  Prolog 
except  for  the  parallelism  in  the  unification  operation,  it  will  be  possible  to  isolate 
the  amount  of  improvement  that  is  attributable  solely  to  the  parallel  unifications. 
Additionally,  simulation  of  conventional  Prolog  with  parallel  unification  can  be 
achieved  by  modifying  an  existing  Prolog  system  rather  than  by  building  one  from 
scratch.  Details  of  the  implementation  of  this  simulator  are  given  in  section  10.4. 

The  second  reason  for  assuming  conventional  sequential  Prolog  is  that  this  is 
the  language  chosen  for  the  Aquarius  project  (8j  ,  into  which  it  is  hoped  this  work 
will  be  integrated.  One  of  the  objectives  of  the  Aquarius  project  is  to  identify  and 
exploit  all  available  parallelism  in  a  sequential  Prolog  program  in  order  to  achieve 
maximum  performance.  Schemes  for  doing  this  have  been  suggested  by  Chang  [2] 
and  Fagin  (12)  based  on  a  simplified  version  of  Conery’s  AND/OR  execution 
model  [6]  and  appropriate  hardware  has  been  suggested  by  Dobry  [10]  . 

It  should  be  noted  that  there  are  a  number  of  other  models  for  parallel  execu¬ 
tion  of  Prolog,  which  provide  language  extensions  and  semantic  changes  [24]  . 
These  models  could  incorporate  the  parallel  unification  scheme  proposed  here, 
although  ho  one  has  yet  attempted  to  apply  static  data-dependency  analysis 
(SDDA)  to  these  models.  Even  if  SDDA  and  parallel  unification  scheduling  were 
to  be  used  in  conjunction  with  these  models,  analysis  of  the  effectiveness  of  the 
scheme  would  be  complicated  by  the  additional  performance  improvements  from 
other  parallel  enhancements  in  the  execution  model. 

3.1.2.  Computational  Model 

The  computational  model  to  be  used  has  been  chosen  for  simplicity  and 
efficiency.  Intuitively,  the  subterms  of  one  of  the  pair  of  terms  to  be  unified  are 
partitioned  into  a  schedule  (to  be  defined  in  chapter  4).  The  blocks  of  this 
schedule  are  ordered.  For  each  schedule  block,  from  first  to  last,  a  unification 
process  is  activated  for  each  subterm  in  the  block  which  unifies  that  subterm  with 
the  corresponding  subterm  in  the  other  term  being  unified.  These  unification 
operations  operate  simultaneously  and  do  not  communicate  with  each  other. 
They  may  be  considered  textual  operations  that  modify  the  subterms  being 
unified  and  all  other  variables  in  the  terms  that  are  affected  by  assignments  to  the 
variables  in  the  two  terms  being  unified.  Considering  these  processes  in  textual 
terms  avoids  the  complexities  of  parallel  memory  access.  The  result  of  two 
processes  in  the  same  schedule  block  assigning  values  to  a  variable,  or  assigning  a 
value  while  others  refer  to  it,  is  undetermined.  When  all  the  processes  in  a  given 
schedule  block  complete,  the  processes  in  the  next  schedule  block  are  activated. 


Figure  3.1  shows  an  example  of  a  schedule  and  its  execution.  (Note  that  the 
schedule  in  figure  3.1  is  not  an  “optimal"  schedule,  but  illustrates  how  a  non¬ 
trivial  schedule  may  have  more  than  one  step.) 


unified  with 


1(A,  I ,  L) 

f(A,  B,  C) 


1  2  3 


step  1 


Y,!Z)! 

II  II 

fj[A,  B,  jc)j 
X=A  Z=C 


r(x;  y,:z) 

step  2  \  | 

f(A,;B,;C) 

Y=B 


one  possible  scneduie 
step  l=[l,3] 
step  2=  [2] 


Figure  3.1  -  A  parallel  unification 


Since  these  unification  processes  are  the  only  processes  acting  on  the  terms  i 

being  unified,  there  is  no  mechanism  to  resolve  conflicting  simultaneous  assign¬ 
ments  to  variables  or  to  combine  chains  of  unifications  formed  when  a  number  of 
variables  are  bound  together  into  equivalence  classes.  Figure  3.2  gives  examples 
of  both  the  above  situations. 

In  figure  3.2(a),  one  unification  process  has  unified  X  with  a,  the  other  with 
b*.  The  result  is  undetermined,  whereas  in  reality  the  unification  would  fail, 
although  here  neither  of  the  two  processes  have  any  knowledge  of  the  actions  of 
the  other.  In  figure  3.2(b),  one  unification  process  has  unified  X  and  Y  (rewriting 
both  X  and  Y  as  some  new  common  variable  _  If);  the  other  has  unified  X  and  Z 
(rewriting  both  X  and  Z  as  some  new  common  variable  .  2).  Such  unifications  are 
correct  locally,  but  are  incorrect  in  the  global  context  of  the  two  terms  being 
unified.  Which  variable  X  has  been  unified  to  (.1  or  .2)  is  undetermined,  and 

*  By  convention,  upper-case  letters  in  Prolog,  like  X  and  Y,  represent  variables,  and 
lower-case  letters,  like  a  and  b,  represent  constants. 

t  When  two  variables  are  unified,  in  order  to  avoid  arbitrarily  giving  the  unified  vari¬ 
ables  one  name  or  the  other,  we  give  the  unified  variables  a  new,  unique  name.  We  will 
use  the  Prolog  convention  of  indicating  generated  variable  names  by  an  underscore  fol¬ 
lowed  by  a  number. 
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Figure  3.2 

m)  inconsistent  bindings 
b)  incomplete  bindings 


there  is  no  indication  that  Y  and  Z  have  been  bound  to  each  other  as  a  result  of 
this  operation. 

This  model  is  computationally  simple,  requiring  only  identical  unification 
processes  and  a  sequencing  mechanism.  It  can  also  be  shown  to  map  more  easily 
onto  modifications  of  existing  Aquarius  project  hardware  (chapter  9). 

In  constructing  schedules  for  parallel  unification  of  a  subgoal  and  a  clause 
head,  we  will  always  construct  the  schedule  as  a  partition  of  the  subterms  of  the 
clause  head.  This  is  because  in  Prolog,  execution  of  head  unification  is  associated 
with  the  called  clause  head  and  not  with  the  called  subgoal.  In  a  call,  control  is 
transferred  from  the  calling  subgoal  to  the  clause  head,  at  which  point  both  the 
call  subgoal  and  clause  head  arguments  have  been  seen  and  unification  can  take 
place.  Parallel  unification  would  require  some  mechanism  to  store  subgoal  argu¬ 
ments  and  execute  the  call,  then  a  number  of  unification  mechanisms,  one  for 
each  unification  proceeding  in  parallel.  If  the  partition  were  associated  with  the 
call  subgoal  multiple  call  mechanisms  would  be  required  to  store  call  arguments, 
locate  clause  head  arguments,  and  perform  unifications.  Partitioning  the  clause 
head  arguments  is  therefore  much  more  efficient. 
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S.2.  Intuitive  notion  of  conditions  for  parallel  unification 

Intuitively,  two  terms  can  be  unified  in  parallel  if  “they  have  nothing  to  do 
with  each  other."  From  the  above  assumptions,  two  terms  may  “have  nothing  to 
do  with  each  other”  if  the  terms  contain  no  variables  in  common.  Since  one  of 
the  side  effects  of  unification  is  to  assign  values  to  variables  in  the  terms  being 
unified,  then  if  two  pairs  of  terms  being  unified  share  no  common  variables,  there 
is  no  possibility  of  two  unification  processes,  one  for  each  pair  of  unified  terms, 
assigning  values,  possibly  different,  to  the  same  variable.  In  such  a  situation,  the 
two  unification  processes  can  safely  go  about  their  unification  tasks  without 
affecting  the  other  pair  of  terms  being  unified. 

If  two  pairs  of  terms  being  unified  share  a  common  variable,  then  two 
unification  processes,  each  performing  an  independent  unification,  may  derive 
incorrect  results,  as  shown  in  the  previous  section.  The  problem  is  complicated  by 
the  fact  that  two  term  pairs  may  contain  distinctly  named  variables  which  have 
been  bound  to  each  other  or  to  a  different  common  variable.  An  assignment  to 
one  of  these  variables  will  affect  all  the  other  variables  to  which  it  has  been 
bound. 

The  determination  of  which  variable  pairs  contain  variables  in  common  with, 
or  bound  to  variables  in,  other  terms  cannot  be  done  through  a  superficial  exami¬ 
nation  of  the  text  of  the  Prolog  program.  The  execution  of  a  program  will  assign 
values  to  variables  so  that  at  different  points  in  the  execution,  some  variables  may 
be  bound,  and  at  other  points  they  may  be  independent.  Relationships  between 
terms  are  also  influenced  by  the  values  of  data  input  to  the  program.  J-H  Chang 
has  developed  a  technique,  called  static  data-dependency  analysis  (SDDA)  (2) 
which  determines  a  worst-case  bound  on  the  relationships  between  variables  in  a 
Prolog  program.  Details  of  SDDA  will  be  described  in  chapter  6.  Here  we  will 
briefly  describe  the  information  yielded  by  SDDA. 

SDDA  classifies  terms  into  three  groups:  ground  terms,  coupled  terms,  and 
independent  terms. 

Ground  terms  are  terms  which  contain  no  unbound  variables.  They  may 
either  be  constants  (e.g.,  atoms  or  integers)  or  structures  whose  arguments  are  all 
ground  terms.  Ground  terms  may  appear  textually  as  variables;  the  point  is  that 
when  execution  of  a  program  reaches  the  point  in  the  text  at  which  that  term 
appears,  it  will  be  instantiated  to  a  ground  term. 

A  coupled  term  is  one  which  shares  a  common  variable  with  another  coupled 
term,  or  which  contains  a  variable  which  has  been  bound  to  a  different  variable  in 
another  term,  and  in  which  neither  variable  has  been  fully  instantiated  (i.e.,  made 
a  ground  term).  Coupled  terms  may  be  partitioned  into  equivalence  classes, 
known  as  coupling  classes,  containing  all  terms  coupled  together. 

An  independent  term  is  a  non-ground  term  which  contains  unbound  variables 
which  neither  appear  in  any  other  term  nor  are  coupled  to  variables  in  any  other 
term. 

When  two  terms  are  to  be  unified,  SDDA  may  be  used  to  partition  the  sub¬ 
terms  of  each  term  into  classes  of  ground,  coupled,  and  independent  variables. 
The  coupled  subterms  may  be  further  partitioned  into  coupling  classes. 
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These  concepts  will  be  more  rigorously  defined  in  later  chapters.  In  the 
remaining  sections  of  this  chapter,  a  number  of  examples  will  be  given  to  show 
how  these  concepts  apply  to  unification. 

3.3.  A  simple  example 

We  are  given  a  Prolog  program  containing  the  call  subgoal  f(X,Y)  and  the 
clause  head  f(a,b).  Unifying  the  call  subgoal  and  the  clause  head  requires  that 
corresponding  subterms,  in  this  case  X  and  a,  and  Y  and  b,  be  unified.  We  would 
like  to  know  whether  X  and  a  may  be  unified  simultaneously  and  independently 
with  Y  and  b.  First  assume  that  X  and  Y  are  independent  terms,  i.e.,  they  con¬ 
tain  no  common  or  coupled  variables,  a  and  b  are  obviously  ground  terms  and 
contain  no  common  or  coupled  variables.  We  can  be  satisfied  that  the  pair  X  and 
a  and  the  pair  Y  and  b  “have  nothing  to  do  with  each  other"  and  can  be  unified 
simultaneously.  As  can  be  seen  in  figure  3.1,  X  is  assigned  the  value  a  on 
unification,  and  simultaneously  Y  is  assigned  the  value  b.  The  subgoal  and  the 
clause  head  unify  to  f(a,b). 
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fKb) 

X=a  Y=b 

Figure  3.3  - 

A  correct  parallel  unification 

Let  us  now  assume  that  X  and  Y  are  coupled  terms,  in  particular,  that  X  and 
Y  have  been  bound  to  each  other.  If  we  adhere  to  our  initial  assumptions  con¬ 
cerning  the  computational  model  and  do  not  wish  to  check  that  parallel 
unifications  are  consistent,  simultaneously  unifying  X  with  a  and  Y  with  b  will 
yield  incorrect  results,  as  shown  in  figure  3.4(a).  Having  independently  assigned  a 
to  X  and  b  to  Y,  there  is  no  way  to  detect  that  X  and  Y  may  only  be  assigned  one 
of  these  values  and  that  the  unification  should  fail.  If  the  unifications  are  done 
sequentially  (figure  3.4(b)),  the  process  unifying  Y  and  b  already  has  the  informa¬ 
tion  that  X  (and  Y)  have  already  been  assigned  the  value  a  and  so  the  unification 
fails. 

3.4.  More  complex  examples 

When  two  pairs  of  subterms  to  be  unified  both  contain  coupled  subterms,  but 
the  coupled  subterms  are  in  different  coupling  classes,  the  two  pairs  still  “have 
nothing  to  do  with  each  other"  and  can  be  unified  simultaneously.  As  an  exam¬ 
ple,  consider  the  unification  of  a  subgoal  f(A,B,B)  and  a  clause  head  f(X,X,Y) 
where  the  variables  A  and  B  are  independent,  as  are  the  variables  X  and  Y.  How¬ 
ever,  the  second  and  third  terms  of  f(A,B,B)  are  the  common  variable  B,  and  so 
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Figure  S.4  - 

a)  an  incorrect  parallel  unification 
b)  a  correct  version  of  (a) 

are  coupled.  Likewise,  the  first  and  second  terms  of  f(X,X,Y)  are  the  common 
variable  X,  and  are  also  coupled.  X  and  B,  however,  are  independent  of  each 
other  before  unification  according  to  the  scoping  rules  of  Prolog.  As  is  shown  in 
figure  3.5(a),  we  may  simultaneously  unify  the  first  and  third  terms  of  the  subgoal 
and  clause  head,  and  subsequently  unify  the  second  terms.  Although  the  first  and 
third  terms  contain  coupled  terms,  which  will  ultimately  be  coupled  to  each  other 
upon  completion  of  the  entire  unification,  during  the  first  step  they  are  in  different 
coupling  classes  and  therefore  independent  of  each  other.  Unifying  the  second 
terms  connects  the  coupling  classes  and  completes  the  unification. 
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Figure  S.&  - 

a)  parallel  unification  involving 
two  coupling  classes 
b)  an  incorrect  version  of  (a) 


Simultaneously  unifying  all  three  pairs  of  terms,  as  in  figure  3.5(b)  would 
violate  the  computational  model  assumption  stated  in  section  3.1  that  there  is  no 
way  to  gather  the  results  of  a  number  of  simultaneous  unification  processes  into 
equivalence  classes.  In  this  case,  process  1  has  unified  X  with  A,  and  process  2  has 
unified  X  with  B.  Since  the  unification  processes  do  not  communicate,  the  implied 
unification  of  A  and  B,  or  of  X  and  Y,  is  not  performed. 

The  previous  example  showed  that  the  two  terms  may  be  in  different  cou¬ 
pling  classes  at  one  point  in  the  unification  and  in  the  same  class  at  a  subsequent 
point.  It  is  also  possible  that  a  coupling  class  can  be  broken  up  as  a  result  of 
some  unifications.  Consider  the  subgoal  f(a,X,Y)  to  be  unified  with  the  clause 
head  f(A,A,A).  The  first  subterm  of  the  subgoal  is  a  ground  term,  and  the  second 
and  third  subterms  will  be  assumed  to  be  independent.  In  the  clause  head,  all 
three  subterms  are  the  common  variable  A  and  are  therefore  coupled. 


To  simultaneously  unify  all  three  pairs  of  subterms  again  will  yield  indeter¬ 
minate  results  according  to  computational  model  (figure  3.6(b)).  However,  if  the 
first  subterm  pair  is  unified  first,  assigning  the  constant  a  to  A,  the  second  and 
third  subterms  of  the  clause  head,  previously  coupled,  now  both  become  ground 
terms  and  may  be  simultaneously  unified  with  their  corresponding  subterms  in  the 
subgoal,  as  shown  in  figure  3.6(a). 
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Figure  3.6  - 

a)  splitting  coupling  classes 
b)  an  incorrect  version  of  (a) 


In  the  previous  examples,  structures  have  not  been  considered.  The  complex¬ 
ities  in  parallel  unification  scheduling  involving  structures  will  be  discussed  in 
chapter  4.  One  example  showing  one  of  the  additional  problems  with  structures 
will  be  presented  here. 

Consider  the  unification  of  the  subgoal  f(A)  and  the  clause  head  f(g(X)) 
(figure  3.7).  Recall  the  discussion  of  “atomic"  units  of  unification  in  the  previous 
section.  The  clause  head  has  two  atomic  units:  the  functor  g/1  and  the  variable 


4.  Parallel  Unification  Scheduling 

To  simplify  understanding  of  the  techniques  involved  in  parallel  unification 
scheduling,  two  versions  will  be  presented.  The  first,  simpler  version  describes 
parallel  unification  scheduling  when  it  is  assumed  that  structures  do  not  appear. 
In  this  case,  subterms  may  only  appear  textually  as  variables  or  constants,  and 
the  values  assigned  to  variables  may  only  be  variables  and  constants.  Using  these 
assumptions,  most  of  the  principles  behind  parallel  unification  scheduling  may  be 
presented  and  understood. 

Following  that,  a  second,  more  complex  version  of  parallel  unification 
scheduling  will  be  presented.  In  this  version,  subterms  may  be  structures  (whose 
arguments,  too,  may  be  structures),  and  likewise  variables  may  contain  structure 
values.  Broadening  the  assumptions  in  this  way  requires  considerable  extensions 
to  the  first  approach  and  requires  considerably  more  precise  SDDA  information  to 
derive  an  efficient  schedule. 

4.1.  Without  Structures 

As  mentioned  above,  in  this  section  we  will  describe  parallel  scheduling  for 
the  unification  of  two  terms,  a  clause  head  and  a  call  subgoal,  where  the  terms  are 
of  the  form  f(tlt  .  .  .  ,tn)  where  the  subterms  tx,  ...  ,tn  are  either  variables  or 
constants  at  compile  time.  Likewise,  we  may  assume  that  at  the  time  of 
unification  (at  run  time)  the  variables  in  the  call  subgoal  will  be  either  unbound, 
or  bound  to  a  constant  or  another  variable.  The  variables  in  the  clause  head  term 
will  be  assumed  to  be  unbound,  as  they  would  be  at  the  time  of  a  call.  The 
“atomic”  unit  of  unification  that  is  to  be  scheduled  is  the  unification  of  a  single 
subterm  t,  of  the  clause  head  term,  that  is,  unification  of  a  variable  or  a  constant, 
with  the  corresponding  subterm  of  the  calling  subgoal. 

4.1.1.  Definitions 

In  the  definitions  below,  assume  a  clause  head  .  .  .  ,tn)  and  a  call 

subgoal  V=f(tx,  .  .  .  ,tnf)  which  are  to  be  unified.  The  subterms 
f i,  .  .  .  ,tn'  are  either  variables  or  constants. 

A  schedule  for  the  parallel  unification  of  t  and  t'  is  a  partition  JJ  of  the  set 
of  subterms  of  t,  {  tx,  ...  ,tn  },  such  that  if  f,-  and  f;-  are  in  a  block  JJk  of  77- 
then  and  f,-'  may  be  unified  at  the  same  time  that  f;  and  tj  are  unified.  Furth¬ 
ermore,  the  blocks  are  ordered,  so  that  if  JJ{  and  JJj  are  blocks  of  JJ  and  » </, 
all  elements  of  JJi  must  be  unified  with  their  corresponding  subterms  in  t'  before 
any  of  the  elements  of  JJj  are  unified  with  their  corresponding  subterms.  We  can 
represent  this  ordering  by  the  notation  JJi  <  JJ-. 

A  mode,  similar  to,  but  not  the  same  as,  the  modes  proposed  by  Mellish  [20] 
,  is  a  representation  of  the  relationships  between  the  subterms  in  the  clause  head 
and  in  the  call  subgoal.  A  mode  is  an  n-tuple  (  mv  .  .  .  ,mn  )  where  m,-  describes 
the  status  of  subterm  f,  (in  the  case  of  a  head  mode),  or  f,'  (in  the  case  of  a  goal 
mode).  Each  mt-  has  a  value  of  g,  Cj,  or  «,  where  g  indicates  that  the  subterm  is  a 
ground  term  (a  constant  or  a  variable  which  has  been  assigned  a  constant  value), 
c j  indicates  that  the  subterm  is  a  coupled  term,  in  this  case  a  variable  which  has 


been  bound  to  other  variables,  but  not  to  a  constant  (and  where  all  the  subterms 
to  which  it  is  coupled  also  have  the  mode  c}-\  there  is  a  different  value  of  j  for 
each  group  of  coupled  variables),  and  t  denotes  an  independent  term,  a  term 
which  is  neither  a  ground  term  nor  a  coupled  term.  A  mode  that  describes  the 
relation  between  subterms  in  a  clause  head  is  a  head  mode,  and  a  mode  which 
describes  the  relation  between  subterms  in  a  call  subgoal  is  a  goal  mode. 

A  current  mode  C,-  is  a  pair  (G,,//,)  describing  the  relationships  between 
all  subterms  in  the  clause  head  and  call  subgoal  being  unified  after  all  subterms  in 
/7i  through  JJ{  are  unified  with  their  corresponding  subterms  in  the  call  subgoal 
t G{  in  the  above  pair  is  the  goal  mode  and  //,•  is  the  head  mode.  C0  is  known 
as  the  entry  mode  and  will  be  discussed  in  more  detail  below.  It  will  be  shown 
in  the  next  section  that  Ci  may  be  calculated  from  Ct_,  and  /J,.  Thus,  any  C,- 
can  be  calculated  from  C0  and  JJX  through  /J,.  The  actual  text  of  t  and  V  are 
not  necessary. 

The  entry  mode  C0  is  composed  of  the  goal  entry  mode  G0  and  the  head 
entry  mode  H0.  G0  is  calculated  using  static  data-dependency  analysis  as 
described  in  chapter  6.  For  the  moment,  we  will  simply  assume  that  it  exists  and 
is  available.  It  represents  the  coupling  relationships  between  subterms  of  the  call 
subgoal  at  the  time  of  the  call  but  before  any  unification  with  the  clause  head  has 
begun. 

H0,  the  head  entry  mode,  can  be  easily  derived  from  the  text  of  the  clause 
head  itself.  This  is  because  in  Prolog  the  variables  in  a  clause  are  unbound  upon 
entry  to  that  clause.  Each  invocation  of  a  clause  results  in  a  new  set  of  variables 
unrelated  to  any  other  set  which  may  still  be  active.  Every  subterm  which  is  a 
constant  is  of  course  a  ground  term.  Any  subterms  which  are  commonly  named 
variables  are  coupled  terms  in  the  same  coupling  class.  Any  subterm  which  is  a 
variable  that  appears  only  once  in  the  clause  head  is  an  independent  term.  Figure 
4.1  shows  some  clause  heads  and  their  head  entry  modes. 

|  Figure  4.1  -  Head  Entry  Modes 
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4.1.2.  Determining  Schedule  Safety 

We  now  define  the  notion  of  a  safe  schedule  for  parallel  unification  of  two 
terms  by  providing  an  algorithm  for  determining  a  schedule's  safety.  Such  an 
algorithm  could  conceivably  be  used  to  create  safe  schedules,  but  it  would  be 
inefficient.  The  algorithm  presented  here  is  simply  used  to  define  the  condition. 

The  key  to  the  safety  of  a  schedule  is  that,  with  one  exception,  for  any  pair 
of  subterms  in  a  schedule  block,  none  of  the  four  subterms  involved  (the  two 


subterms  in  the  clause  head  aud  their  corresponding  subterms  in  the  call  subgoal) 
be  coupled  to  each  other.  The  one  exception  is  that  a  subterm  in  a  clause  head 
and  its  corresponding  subterm  in  the  call  subgoal  may  be  coupled.  In  fact,  unify¬ 
ing  the  coupled  subterms  is  essentially  a  no-op,  and  such  a  subterm  may  be  dis¬ 
carded  from  the  schedule.  Figure  4.2  gives  a  simple  example  of  this.  In  the 
schedule  given,  subterm  1,  the  first  X,  is  unified  with  its  corresponding  subterm, 
the  first  A.  Now,  prior  to  the  second  block,  which  contains  the  second  subterm, 
this  subterm  and  its  correspondent  in  the  subgoal  are  coupled.  It  is  not  necessary 
to  perform  the  unification.  We  will  see,  however,  that  when  SDDA  yields  the 
information  that  two  terms  are  coupled,  it  really  means  that  it  is  possible  that 
they  may  be  coupled  when  execution  reached  that  point  in  the  program.  In  other 
possible  cases,  the  two  terms  may  be  independent.  However,  if  we  know  abso¬ 
lutely  that  the  two  terms  are  coupled  any  time  the  clause  is  called,  we  may  use 
this  optimization. 

Briefly,  the  reason  that  two  coupled  terms  need  not  be  unified  is  that  the 
condition  of  coupling  implies  that  the  two  terms  are  either  identically  named 
unbound  variables,  have  already  been  unified  with  each  other,  or  have  been 
unified  with  a  common  variable.  Thus,  another  unification  would  be  redundant. 

As  mentioned  earlier,  the  schedule  is  associated  with  the  clause  head,  rather 
than  the  subgoal,  or  with  the  clause’s  procedure  as  a  whole.  There  are  a  number 
of  reasons  for  this.  The  first  reason  is  that  each  procedure  may  have  several 
clauses.  Each  clause  will  have  its  own  head  entry  mode,  H0 ,  which  can  easily  be 
computed.  The  H0's  will  likely  be  different  for  each  clause  in  the  procedure  and 
such  differences  may  require  different  schedules.  The  schedule  is  not  associated 
with  the  subgoal  because,  first,  in  Prolog  execution,  unification  cannot  begin  until 
both  the  subgoal  and  the  clause  head  have  been  seen,  i.e.,  until  control  has 
passed  from  the  subgoal  to  the  clause  head.  Secondly,  the  same  Prolog  pro¬ 
cedures  may  be  called  in  many  different  ways  from  many  different  subgoals  each 
of  which  may  have  a  different  goal  entry  mode  and  which  may  even  have  goal 
entry  modes  that  vary  at  different  places  in  the  execution  of  the  program.  In  fact, 
in  the  static  case  (section  5.2),  goal  entry  modes  used  for  unification  scheduling 
are  actually  generalizations  of  all  entry  modes  which  may  occur  during  a  call  to 
that  clause,  combined  to  form  a  worst-case  estimate.  This  is  because  a  clause 
may  be  called  from  a  number  of  sites  in  a  program,  each  of  which  may  call  the 
clause  with  a  different  mode.  It  is  even  possible  that  a  clause  may  be  called 
several  different  ways  from  a  single  calling  site.  This  will  be  explained  further  in 
chapter  6,  but  we  may  assume  that  this  worst-case  estimate  is  actually  the  precise 
goal  mode  of  a  single  call  subgoal  which  is  the  only  site  from  which  the  clause  in 
question  is  called.  It  can  be  seen,  however,  that  the  only  constant  in  the  above 
candidates  for  association  with  a  schedule  is  the  clause  head,  with  its  unvarying 
head  entry  mode,  and  thus  the  best  entity  with  which  to  associate  the  schedule. 
Also,  and  not  least  important,  it  will  be  seen  that  such  a  scheme  maps  well  onto 
the  proposed  implementation  described  in  chapter  9. 

Algorithm  4.1  gives  a  decision  procedure  for  determining  the  safety  of  a 
schedule. 


Figure  4.2:  Unifying  Coupled  Variables 


subgoal 
clause  head 

...,f(A,A),... 
f(X,X)  .- 
(1)  (2) 

G0=(ci,Ci) 

77.  =  {(i)} 

After  Ylv 

Gl=(Ci,Ci) 

unnecessary 

772  =  {(2)} 

Algorithm  \.l:  decision  procedure  for  schedule  safety. 

Input: 

A  clause  head  t  =  f(<|,  .  .  .  ,fn)  and  a  call  subgoal  t'  =  f(</,  .  .  . 

C0  =  (H0,G0)  containing: 

A  goal  entry  mode  G0  computed  by  SDDA. 

A  head  entry  mode  H0  computed  from  t . 

A  schedule  [J  =  UIv  Jim)  for  parallel  unification  of  t  and  t'. 

Output: 

SAFE  if  schedule  is  safe,  UNSAFE  otherwise. 

Algorithm: 

For  each  schedule  block  JJi  from  Jf  \  to  Tin 

1:  if  there  exist  two  terms  such  that  at  least  one  of  the  pairs 

(C,-,  y,G,- _i  *), 

l  is  the  element  in  Hi  corresponding  to  tj,  similarly  for  G,;-.] 

is  (cj.cj)  for  some  Cj,  output  ’UNSAFE’  and  halt. 

else  compute  (7,-  according  to  the  function  C,-  =  next.C(C,_j,  m 

output  ’SAFE’  and  halt. 

Function  next_C(C't_,,jfJt)  is  computed  as  follows: 
for  j  =  1  to  n 

if  *y€77» 

(Hi,Gi)  =  table(//,-,G,,y)  according  to  the  table  4.1  below. 

If  the  mode  for  a  coupling  class  ck  or  c(  is  augmented  (see  section  6),  i.e.,  if 
we  know  that  all  subterms  in  that  coupling  class  are  always  coupled  regardless  of 
the  way  that  clause  is  called  in  the  program,  we  may  modify  the  table  as  shown  in 
table  4.2. 

If  it  is  known  that  a  term  is  always  coupled  to  other  terms  at  a  certain  point 
in  a  unification  schedule,  then  if  one  of  the  coupled  terms  is  unified  with  a  ground 


IV  B' 


Table  4.1  unification  simulation  table. 

<*i-U 


a)  Hij=Gi}=g. 

b)  Hij=G{j=cm  for  some  new  m . 

c)  Hij=Gij=ck, 

d)  Hij—G^—Ci. 

e)  all  elements  of  //,_ j  and  G,_t  which  are  ck  or  ct  are  replaced  in  H{  and  G, 
with  c_  for  some  new  m . 


Table  4.2  augmented  unification  simulation  table. 

<*i-U 

g  *  Cfc 


«l-U 


*)  unchanged  from  table  4.1 

a)  Hij=g,  and  all  elements  of  and  G,_j  which  are  ck  are  replaced  in  Hi 
and  G,  with  g. 

b)  Gij=g,  and  all  elements  of  and  G,_j  which  are  q  are  replaced  in  f/, 
and  G,-  with  g. 


term,  all  of  the  coupled  terms  become  ground  terms.  This  cannot  be  done 
without  augmented  information  since  terms  in  a  non-augmented  coupling  class 
may  or  may  not  actually  be  coupled.  As  is  shown  in  chapter  6,  in  the  absence  of 
augmentation,  we  must  include  potentially  coupled  terms  in  a  coupling  class. 
Since  augmentation  allows  us  to  eliminate  entire  coupling  classes,  it  allows  greater 
opportunities  for  parallelism  in  the  schedule.  After  unifying  one  element  of  a  cou¬ 
pling  class  with  a  ground  term,  all  the  other  coupled  terms  become  ground  and 
can  be  unified  in  parallel  (assuming  that  the  corresponding  terms  are  not  coupled). 
Thus,  it  will  be  necessary  to  develop  techniques  for  augmentation  to  take  advan¬ 
tage  of  substantial  additional  opportunities  for  parallelism;  these  techniques  will 
be  explained  in  section  6.2. 


4.1. S.  Scheduling  Algorithm 


4.I.3.I.  Algorithm 

The  proof  that  unification  scheduling  is  NP-complete  requires  a  number  of 
ideas  introduced  in  this  and  following  sections,  and  so  will  be  deferred  to  the  end 
of  the  chapter.  However,  given  that  the  problem  is  NP-complete,  there  are  a 
number  of  approaches  that  may  be  taken.  One  is  to  choose  at  random  a  schedule, 
test  it  according  to  the  polynomial  algorithm  of  section  4.1.2,  and  keep  trying 
schedules  until  a  safe  one  is  found  which  has  a  sufficiently  small  number  of  paral¬ 
lel  steps.  Where  the  number  of  unifications  to  schedule  is  small,  this  may  be  a 
feasible  approach. 

Another  approach  is  to  design  an  efficient  algorithm  which  provides  good 
(optimal  or  near-optimal)  results  in  most  cases.  Three  such  heuristic  algorithms 
will  be  presented  here. 

Two  of  the  algorithms  to  be  presented  are  “local”  heuristic  algorithms. 
Given  a  set  of  unifications  which  have  not  yet  been  scheduled,  the  algorithms 
select  the  unifications  to  be  scheduled  in  the  next  schedule  block.  The  process  is 
repeated  until  all  unifications  are  scheduled.  The  two  algorithms  differ  in  the 
method  by  which  unifications  are  selected  for  scheduling.  The  third  algorithm  is 
a  “global”  heuristic  algorithm,  in  which  the  entire  schedule  is  taken  into  account. 
In  this  algorithm,  an  initial  schedule  is  provided  and  then  repeatedly  improved  by 
migrating  unifications  upwards  (that  is,  from  the  end  toward  the  beginning  of  the 
schedule).  The  process  may  be  repeated  as  often  as  desired  or  until  no  further 
improvements  are  made  in  the  schedule.  All  three  algorithms  have  been  imple¬ 
mented,  and  a  comparison  of  their  performance  is  given  in  chapter  11. 

In  the  global  algorithm,  a  simple  initial  schedule  is  provided,  then  repeatedly 
improved  upon.  This  schedule  may  start  with  one  unification  per  schedule  parti¬ 
tion,  ordered  simply  as  the  unifications  appear  from  left  to  right  in  the  original 
program  text,  or  perhaps  ordered  so  that  independent  pairs,  type-1  pairs,  type-2 
pairs,  and  type-3  pairs  appear  in  that  order.  (These  terms  are  defined  later,  with 
the  third  algorithm.) 

The  object  of  the  algorithm  is,  starting  from  the  end  of  the  schedule,  to  move 
unifications  to  the  front  of  the  schedule,  adding  them  to  earlier  schedule  blocks. 
This  has  the  effect  of  increasing  the  parallelism  in  earlier  blocks,  and,  hopefully, 
emptying  later  blocks  so  that  they  disappear,  thereby  shortening  the  effective 
length  of  the  set  of  unifications.  Moving  unifications  from  the  end  forward,  rather 
than  the  other  direction,  was  chosen  because  it  is  necessary  to  recompute  all 
modes  from  the  block  to  which  the  schedule  was  moved  until  the  end  of  the 
schedule.  The  direction  of  migration  was  chosen  to  minimize  the  number  of  node 
recomputations. 

Starting  from  the  end  of  the  schedule,  a  unification  is  selected  and  an 
attempt  is  made  to  move  it  to  an  earlier  block.  The  first  block  found  to  which 
the  unification  may  be  moved  (the  search  is  made  from  back  to  front),  that  is, 
where  the  candidate  unification  is  not  coupled  to  any  other  element  of  the  block, 
is  the  block  to  which  the  unification  is  moved.  It  might  be  possible  to  move  the 
unification  even  farther  forward;  if  so,  this  will  be  discovered  on  a  subsequent 


iteration.  After  the  unification  has  been  moved,  all  subsequent  modes  are  recom¬ 
puted.  If  any  subsequent  block  has  been  made  unsafe  by  the  movement,  the 
unification  is  moved  back  to  where  it  was  found.  If  the  original  search  yields  no 
block  into  which  the  unification  could  be  moved,  that  is,  the  search  reached  the 
first  block  without  finding  a  safe  destination,  the  unification  is  not  moved.  This 
process  is  repeated  until  movement  has  been  attempted  for  all  unifications  in  all 
blocks.  The  entire  process  can  be  repeated  as  many  times  as  desired,  or  until  no 
further  improvement  is  yielded.  Algorithm  4.2  gives  this  global  heuristic  schedul¬ 
ing  algorithm. 

It  might  be  interesting  to  formulate  this  algorithm  as  a  rewriting  system 
according  to  the  Knuth-Bendix  scheme  [17]  ,  which  it  somewhat  resembles.  A 
number  of  problems  would  have  to  be  overcome,  however.  First,  since  there  may 
be  more  than  one  solution,  it  may  be  possible  that  no  set  of  axioms  implementing 
a  scheduling  system  will  be  complete  (i.e.,  derives  exactly  one  irreducible  schedule 
from  a  given  input  schedule).  Second,  since  the  scheduling  problem  is  NP- 
complete,  a  Knuth-Bendix  derivation  will  probably  take  exponential  time  to  exe¬ 
cute,  while  the  similar  heuristic  algorithm  presented  here  will  work  in  polynomial 
time.  It  might  be  possible  to  limit  the  number  of  reductions  in  any  derivation  to 
fall  within  a  polynomial  bound,  which  would  usually  leave  the  derivation  incom¬ 
plete,  and  it  is  not  clear  how  quickly  the  Knuth-Bendix  method  converges  on  an 
optimal  schedule.  The  above  algorithm,  on  the  other  hand,  is  guaranteed  to  reach 
a  stable  solution  within  a  polynomial  bound.  Finally,  the  axioms/lemmas  required 
for  transformations  are  substantially  more  complex  than  any  of  the  toy  group 
theory  problems  presented  in  Knuth  and  Bendix’s  paper. 


Algorithm  4-2  -  Global  heuristic  scheduling  algorithm 


Input: 

Cq=(Gq,Hq) 

<=/Ui, 

Output: 

IJ={Uv  '  ‘  •  >Ilm } 

Algorithm: 

given  the  set  {l,...,n},  partition  the  set  into 
a  schedule  J J  so  that  each  block  JJi 
is  a  singleton  set. 

compute  Cx,  .  .  .  ,Cn  using  the  next.C  function, 
p.size  =  n; 

foreach  i  from  p.  size  to  2 
foreach  p€jji 
foreach  j  from  «'— 1  to  1 
if  p  can  be  safely  placed  in  JJi 
/*  if  Gj_l  p=ck  or  H.l  p=ck,  for  some  k, 
and  there  is  no  ?€jfjy  such  that 
Gi-i,i=ek  or  Hj_u=ck  */ 

then 

move  p  from  to  Hi 
recompute  Cj,  .  .  .  ,Cp  ,ize  using  next.C 
if  there  exists  l£j,  .  .  .  ,p_size  such  that 
Hi  is  unsafe  according  to  Ct_ j 
/*  if  there  exists  a  g,r£jji{q^r) 
such  that  Gt_l  v=ck  or  Ht_l  q—ck  (for  some  k) 
and  Gt_lr=ck  or  H,_l  r=ck  */ 
then  move  p  back  to  77,  and  break  loop 

if  77,=0 

then 

p.size  =  p.size  -  1 
foreach  k  from  i  to  p.  size 

Ilk—IIk+l 


As  mentioned  before,  there  are  many  ways  by  which  the  original  order  may 
be  chosen.  The  left-to-right  approach  is  basically  a  random  ordering.  The 
classification  by  type  (independent,  type-1,  -2,  -3)  may  be  more  successful.  The 
description  of  the  second  local  algorithm  will  define  the  above  types  and  provide 
the  rationale  for  this  ordering. 

The  remaining  two  algorithms  are  “local”  algorithms,  in  that  they  are  con¬ 
cerned  with  scheduling  a  single  schedule  block  from  the  set  of  scheduiable 
unifications  and  the  current  mode.  The  first  approach  is  to  simply  schedule  as 
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many  unifications  as  possible.  The  second  approach  is  to  predict  the  influence  of 
each  unification  on  the  parallelism  in  succeeding  unifications  and  rank  them 
according  to  that  criterion.  Then,  as  many  “good"  unifications  as  possible  are 
scheduled,  before  less  good  unifications  are  considered.  This  approach  may  not 
yield  as  large  a  block  as  possible,  but  it  may  uncover  additional  parallelism  for 
subsequent  blocks. 

The  first  of  the  local  algorithms  may  be  implemented  by  constructing  a  cou¬ 
pling  graph  from  the  current  mode  and  the  set  of  unscheduled  unifications,  in 
which  each  ground  and  independent  term  is  represented  by  a  node,  as  is  each  cou¬ 
pling  class.  Edges  representing  the  correspondence  of  subterms  in  the  two  terms 
being  being  unified,  and  therefore  unifications  to  be  scheduled,  are  added  to  con¬ 
nect  nodes  associated  with  corresponding  terms.  [See  figure  4.3(a)  for  an  example 
of  a  coupling  graph.]  In  a  coupling  graph,  two  unifications  involving  the  same  cou¬ 
pling  class  are  represented  by  edges  incident  on  the  same  node.  Choosing  a  safe 
schedule  block  is  equivalent  to  choosing  a  set  of  edges  without  common  nodes. 
Choosing  the  largest  safe  schedule  block  is  equivalent  to  choosing  the  largest  set 
of  edges  with  no  common  nodes  [figure  4.3(b)].  This  is  the  maximal  matching 
problem,  which  can  be  solved  in  polynomial  time.  Since  there  are  unifications 
leading  to  graphs  which  are  not  bipartite  (see  figure  4.3(c)),  the  problem  can  be 
solved  in  time  0(0)  where  V  is  the  number  of  nodes  in  the  graph  [21]  .  In  terms 
of  unification,  V—G+I+C  where  G  is  the  number  of  ground  terms,  I  is  the 
number  of  independent  terms,  and  C  is  the  number  of  coupling  classes.  The 
scheduling  algorithm  for  parallel  execution  repeatedly  constructs  a  coupling 
graph,  finds  the  maximal  matching,  then  alters  the  coupling  status  to  reflect  the 
newly  scheduled  unifications  (according  to  the  function  next.C  in  table  4.2), 
repeating  the  process  until  all  unifications  are  scheduled.  If  N  were  the  total 
number  of  subterms  in  the  two  terms  being  unified,  then  the  upper  limit  to  the 
number  of  iterations  would  be  N/ 2,  the  number  of  pairs  to  be  scheduled  (since  in 
some  cases,  one  unification  per  schedule  block  might  be  necessary,  as  in  figure 
4.3(c)).  N>G+I+2C  (since  each  coupling  group  must  be  represented  at  least 
twice  in  a  unification,  otherwise  the  variable  would  be  merely  independent).  The 
above  algorithm  performs  scheduling  in  time  0(N*V*),  and  since  N>V,  the  time 
of  execution  is  at  least  0(0). 

The  second  “local”  approach  improves  on  the  first  in  a  number  of  ways. 
First,  it  would  be  desirable  to  reduce  the  exponent  in  the  time  complexity  from 
five  to  two  or  three.  Secondly,  although  each  iteration  produces  the  maximal 
matching  at  that  step,  the  particular  choices  of  edges  (and  therefore  of  scheduled 
unifications)  may  reduce  the  size  of  maximal  matchings  in  subsequent  steps.  For 
example,  if  a  unification  is  scheduled  which  unified  variables  in  two  distinct  cou¬ 
pling  classes,  in  subsequent  scheduling  steps  variables  in  the  two  classes  may  not 
be  unified  simultaneously,  since  the  initial  unification  combined  the  two  previous 
coupling  classes  into  one.  Had  the  initial  unification  not  been  done  at  that  time, 
further  parallel  unifications  involving  variables  in  the  two  coupling  classes  could 
have  been  scheduled.  Conversely,  if  a  variable  in  a  coupling  class  were  to  be 
unified  with  a  ground  term,  all  variables  coupled  to  it  would  also  automatically  be 
unified  with  that  ground  term,  and  the  coupling  class  would  disappear.  There 
would  then  be  nothing  to  keep  the  formerly  coupled  variables,  now  ground  terms, 
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f(X,a,Y)  G  o  =  (i,g,i) 
f(A,B,B)  H0-(i,c  „ci) 


G'i  *  (Cl»c2.c2) 

—  (C3iCg,Cj) 


Figure  4.3 

a)  a  coupling  graph 

b)  a  maximal  matching  for  (a) 

c)  a  non-bipartite  coupling  graph 


from  being  scheduled  in  the  same  block  m  a  subsequent  step. 

The  heuristics  to  be  used  are,  first,  to  delay  combining  coupling  classes  by 
unifying  pairs  of  coupled  variables  until  it  is  unavoidable,  and  second,  to  unify 
coupled  variables  with  ground  terms  as  early  as  possible  in  order  to  eliminate 
entire  coupling  classes  and  exploit  more  parallelism. 

In  addition,  unification  pairs  composed  only  of  ground  and  independent  terms 
(having  no  coupled  variables)  may  always  be  scheduled.  These  pairs,  which  we 
will  call  independent  pairs  may  be  discovered  by  simple  inspection  and  need 
not  be  included  in  the  coupling  graph. 


The  algorithm  for  scheduling  a  step  first  schedules  all  independent  pairs.  It 
then  sorts  each  of  the  remaining  pairs  (which  each  contain  at  least  one  coupled 
variable)  into  one  of  3 C  bins,  where  C  is  the  number  of  coupling  classes.  For 
each  coupling  class,  there  are  three  bins,  one  each  for  type-1,  type-2,  and  type-3 
pairs.  The  pairs  are  divided  up  into  these  groups  as  follows. 

-  Type-I  pairs  contain  a  ground  term.  Unifying  one  of  these  pairs  would  cause 
all  variables  coupled  to  the  variable  in  the  pair  to  become  ground  terms  in 
subsequent  scheduling  blocks  (assuming  augmented  coupling  information). 
This  increases  the  available  parallelism  in  these  later  blocks,  since  scheduling 
of  those  pairs  will  no  longer  be  constrained  by  the  previously  existing  cou¬ 
pling  class. 

•  Type-2  pairs  are  those  pairs  that  contain  either  an  independent  term  or  con¬ 
tain  two  terms  which  are  members  of  the  same  coupling  class.  In  the  former 
case,  the  unification  of  such  a  pair  will  add  the  independent  term  to  the  cou¬ 
pling  class,  but  since  an  independent  term  only  appears  once,  it  will  not  be 
seen  again  in  subsequent  scheduling  blocks.  In  the  latter  case,  the  unification 
will  not  change  the  coupling  configuration  at  all,  since  the  terms  are  already 
coupled.  In  either  case,  the  effect  on  subsequent  schedule  blocks  is  neutral;  it 
neither  increases  nor  decreases  subsequent  available  parallelism. 

-  Type-3  pairs  contain  two  coupled  terms  which  are  members  of  different  cou¬ 
pling  classes.  When  a  pair  of  this  type  is  unified,  the  two  coupling  classes  are 
joined  in  subsequent  scheduling  blocks.  This  decreases  the  amount  of  avail¬ 
able  parallelism  in  subsequent  blocks,  since  pairs  which  previously  involved 
different  coupling  classes  and  could  be  scheduled  to  be  executed  in  parallel 
may  now  involve  the  same  coupling  class  and  must  be  scheduled  to  be  unified 
sequentially. 

Once  the  pairs  have  been  sorted,  one  pair  is  chosen  from  each  of  the  type-1 
bins.  If  a  type-1  bin  is  empty,  a  pair  is  chosen  from  that  coupling  class’  type-2 
bin.  A  type-3  pair  only  becomes  a  candidate  for  scheduling  if,  in  both  coupling 
classes  with  which  it  is  associated,  the  type-1  and  type-2  bins  are  both  empty. 
From  all  such  candidate  pairs,  a  coupling  graph  is  created,  and  the  maximal 
matching  is  found  using  the  standard  algorithm  (again,  for  non-bipartite  graphs). 
This  matching  is  translated  back  into  pairs  and  those  pairs  are  scheduled. 

Algorithm  4.3  provides  the  single-step  scheduling  algorithm. 
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Algorithm  4-S:  Scheduling  a  single  step  -  no  structures. 
Input: 

c,-,  = 

List  5  of  pairs  not  yet  scheduled. 

Output: 

JJi  (new  schedule  block), 
new  S. 

Algorithm: 

nMh 

/*  initialize  bins  */ 

for  k  —  I  to  C  /*  C  =  number  of  coupling  classes  */ 
for  1  *  l  to  3 

foreach  j£S 

/*  schedule  independent  pairs  */ 

I:  if  j  is  independent  pair 
then  JX=/7.-UU>'- 
else  begin  /*  sort  remaining  pairs  */ 
if  j  is  type-1 

with  coupled  variable  in  ck 
then 

else  if  j  is  type-2 

with  coupled  variable  in  ck 

then  Bk'2=Bk>2\j{jy, 

else  if  j  is  type-3 

with  coupled  variable  in  ck,ct 

then  begin 

Bk,3—Bkz\^j{j}; 

Bi,z~Bi}3\J{j}; 

end 

end 

f*  schedule  type-1  and  type-2  pairs  */ 

2:  for  *  s=  1  to  C 

then  begin 

choose  j; 

77<-77<ui»: 

end 

else  if  Bi  2j£  {} 


f  sa 


w, 


then  begin 

choose  j£B{  2; 

i7<«J7<ufc>; 

end 

/*  find  candidate  type-3  pairs  */ 

foreach  (t,k)E(l,  .  .  .  ,C)X(1,  . . .  ,C) 

such  that 

and  there  exists  j 

such  that  ;€B,i3  and  ;€Bi  3 

add  i  to  coupling  graph,  G; 

find  maximal  matching,  M,  of  G; 

/*  translate  matching  to  schedule  */ 

3:  for  all  (citct)6M 
begin 

find  /€Bt3p|Bt3; 

77,=i7,U{»; 

end 

/*  generate  new  S  */ 

s=s-n,. 

The  entire  scheduling  algorithm  simply  iterates  over  the  single-step  schedul¬ 
ing  algorithm  until  all  subterm  unifications  have  been  scheduled.  The  next.  C 
function  mentioned  in  algorithm  4.4  is  the  one  initially  mentioned  in  algorithm  4.1 
and  table  4.2.  This  function  computes  C{  from  (?,•_ j  and  JJ{  by  simulating 
unification  at  the  coupling  status  level.  The  single-step  scheduling  algorithm  is 
represented  in  algorithm  4.4  by  the  function  single,  step.  Both  algorithm  4.3  and 
the  first  local  algorithm  may  be  used  as  the  single,  step  function. 

As  mentioned  before,  all  three  scheduling  algorithms,  or  rather  their  exten¬ 
sions  that  handle  structures,  have  been  implemented  and  tested.  The  results  are 
given  in  chapter  11. 

4.I.3.2.  Analysis  of  Scheduling  Algorithms 

By  inspection,  it  can  be  seen  that  the  inner  loop  of  the  first  scheduling  algo¬ 
rithm  (4.2)  is  executed  0(n3)  times  for  each  execution  of  the  algorithm,  if  n  is  the 
number  of  subterms  to  be  scheduled.  The  outer  loop  is  executed  p.size  times, 
where  p.size  is  at  most  n.  For  each  iteration  of  the  outer  loop,  the  second  loop  is 
executed  once  for  each  pair  in  the  schedule  block  being  examined.  This  can  be  at 
most  n-1  (if  it  were  n,  the  schedule  would  be  a  single  block  and  the  algorithm 
would  have  terminated).  For  each  iteration  of  the  second  loop,  the  inner  loop  is 
executed  at  most  n-1  times.  Thus  the  0(n3)  execution  time.  It  may  be  necessary 
to  repeat  the  entire  process  no  more  than  n  times  before  no  further  improvements 
can  be  made.  Thus,  the  entire  scheduling  process  will  take  0(n4)  time. 


Algorithm  4-4:  Heuristic  scheduling  algorithm,  no  structures. 
Input: 

Cq=(G0,H0), 

t=f(t 

Output: 

II=(IIv  •  •  •  >IJm ) 

Algorithm: 

5={1,  .  .  .  ,n}; 
i  =  1; 
for  S^k {} 
begin 

(IIi’S)=8in9le-  8teP(ci-vs); 

C,  =nezt_C(Ci_l  ,JJ- ); 


As  mentioned  in  the  previous  section,  the  second  scheduling  algorithm  will 
take  0(n5)  time. 

The  third  unification  scheduling  algorithm  has  an  execution  scheduling  time 
of  0(JV*max(JV,C4)),  where  N  is  the  number  of  subterms  in  the  clause  head  t, 
and  C  is  the  number  of  coupling  classes  present  in  the  clause  head  and  the  call 
subgoal,  as  counted  in  the  entry  mode  C0. 

The  single,  step  subroutine  (algorithm  4.3)  can  be  divided  into  six  parts.  The 
first  part,  which  initializes  the  bins  into  which  type-1,  2,  and  3  pairs  are  sorted 
takes  time  0(C).  The  second  part,  sorting  the  pairs  into  independent  pairs  and 
bins  for  coupled  pairs,  takes  time  O(N).  A  single  iteration  of  the  loop  can  be 
done  in  constant  time,  since  recognition  of  the  type  of  pair  can  be  done  in  con¬ 
stant  time,  as  can  the  depositing  of  the  pair  in  a  bin.  Since  the  loop  acts  on  each 
element  in  the  set  S,  which  can  be  of  maximum  size  N,  N  iterations  are  per¬ 
formed,  and  the  step  can  be  completed  in  time  O(N). 

The  next  step,  in  which  type-1  and  type-2  pairs  are  scheduled,  takes  O(C), 
since  C  iterations  of  the  loop  are  performed,  and  each  iteration,  which  either 
determines  that  a  bin  is  empty  or  selects  a  member  of  that  bin,  is  done  in  con¬ 
stant  time. 

The  location  of  candidate  type-3  pairs  can  be  done  in  time  0(P3),  where  P3 
is  the  number  of  type-3  pairs.  For  each  type-3  pair,  the  type-1  and  type-2  bins 
for  the  corresponding  coupling  classes  can  be  inspected  in  constant  time.  Edges  in 
the  coupling  graph  can  be  added  in  constant  time  if  the  graph  is  represented  by 
an  adjacency  matrix. 

The  maximal  matching  of  the  graph  G  can  be  found  in  time  0(C4)  using  the 
algorithm  in  (21]  . 


to  translate  the  matching  to  a  set  of  pairs  to  be  scheduled,  each  edge  in  the 
matching  must  be  matched  with  its  corresponding  pair.  If  the  information  relat¬ 
ing  edges  to  pairs  was  stored  in  the  adjacency  graph,  each  translation  can  be  done 
in  constant  time,  since  the  size  of  the  matching  M  is  bounded  above  by  P3,  the 
number  of  type-3  pairs,  the  entire  translation  can  be  accomplished  in  P3  iterations 
of  the  loop,  so  that  this  part  takes  time  0(P3). 

Since  P3<JV,  one  execution  of  algorithm  4.3  takes  time  0(max(N,C4)). 

In  considering  the  entire  scheduling  algorithm  (4.4),  the  next.  C  subroutine 
(from  algorithm  4.1)  takes  O(N)  time,  since  it  can  be  implemented  by  going  down 
the  two  tuples  (7t-  and  Hit  and  modifying  corresponding  elements  Gi}-  and  Hi}  if 
/€/7„  according  to  table  4.2.  If  table  4.2  indicates  that  a  coupling  class  must  be 
changed  to  ground  terms,  or  that  two  terms  must  be  joined,  the  relevant  informa¬ 
tion  may  be  stored  so  that  if  such  coupled  terms  are  encountered  later  in  G,-  and 
the  appropriate  changes  may  be  made. 

Since,  in  the  worst  case,  the  main  loop  in  algorithm  4.4  will  require  N  itera¬ 
tions,  one  for  every  element  of  S,  the  entire  algorithm  will  take  time 
0(max(N,C4)). 

4. 1.3.3.  Proof  of  Correctness 

In  this  section,  we  briefly  prove  a  theorem  concerning  the  correctness  of  the 
third  scheduling  algorithm.  In  particular,  we  will  show  that  the  scheduling  algo¬ 
rithm  only  generates  safe  schedules,  that  is,  schedules  which,  when  used  as  input 
to  algorithm  4.1,  cause  that  algorithm  to  output  “SAFE”  and  halt.  Correctness 
of  the  other  scheduling  algorithms  may  be  proven  similarly. 

Definition-  a  schedule  block  JJ{  of  a  schedule  JJ  is  unsafe  if  there  exists 
such  that  either  G,_j  y=Cj  or  and  either  G,_l  t=Cj 

or  Hf.i  jfcssscj,  for  some  /.  A  schedule  block  that  is  not  unsafe  is  safe. 

First,  we  prove  two  lemmas. 

Lemma  1-  Algorithm  4.1  outputs  SAFE  if  and  only  if  all  blocks  of  JJ  are 

safe. 

Proof: 

if  -  If  each  block  of  JJ{  is  safe,  then  at  the  ith  iteration  of  the  algorithm,  JJ{ 
will  fail  to  satisfy  the  condition  of  statement  1  in  the  algorithm.  (An  unsafe 
block  would  satisfy  the  condition  of  statement  1,  since  this  is  exactly  the  con¬ 
dition  for  unsafeness.  If  it  were  to  satisfy  the  condition,  the  algorithm  would 
output  UNSAFE  and  halt.)  Since  the  condition  is  not  satisfied,  the  algorithm 
proceeds  to  compute  C and  examines  the  next  block.  After  n  blocks  have 
been  examined,  the  algorithm  outputs  SAFE  and  halts. 

only  if  -  The  only  way  that  the  algorithm  can  reach  the  final  statement,  in 
which  SAFE  is  output,  is  if  each  block  /J,  of  JJ  is  examined  and  fails  to 
satisfy  the  unsafety  test  i  step  1.  Thus,  algorithm  4.1  outputs  SAFE  only  if 
all  blocks  in  the  schedule  are  safe. 


Lemma  2  -  Algorithm  4.3  only  generates  safe  schedule  blocks. 

Proof: 

The  algorithm  takes  as  input  the  set  5  of  pairs  which  are  available  for 
scheduling,  as  well  as  the  current  mode  C,_i  describing  the  relationship 
among  all  the  pairs,  both  those  in  S  and  those  already  scheduled.  There  are 
three  steps  in  the  algorithm  where  pairs  are  scheduled,  marked  1,  2,  and  3. 
In  step  I,  all  independent  pairs  are  scheduled.  Since,  by  their  definition,  they 
contain  no  coupled  terms,  obviously  no  two  of  them  can  share  common  cou¬ 
pled  terms,  and  thus  they  cannot  cause  a  block  to  be  unsafe. 

In  step  2,  type-1  and  -2  pairs  are  scheduled.  None  of  them  can  share  coupled 
terms  with  the  scheduled  independent  pairs.  Nor,  since  only  one  pair  is 
selected  from  any  coupling  class’  type-1  or  -2  bins,  can  they  share  coupled 
variables  with  each  other.  Scheduling  these  pairs  will  not  cause  the  block  to 
become  unsafe. 

Finally,  in  step  3,  pairs  are  scheduled  from  candidate  type-3  pairs.  The  can¬ 
didate  pairs  are  all  chosen  from  coupling  classes  which  have  no  type-1  or  -2 
pairs  associated  with  them.  Thus,  adding  any  of  these  pairs  to  the  pairs 
already  scheduled  will  not  affect  the  safety  of  the  schedule. 

The  coupling  graph  is  then  formed  from  these  pairs.  Edges  in  the  matching 
are  chosen  so  that  none  are  incident  on  the  same  vertex.  If  two  edges  do  not 
share  a  common  vertex,  then  their  corresponding  pairs  do  not  share  a  com¬ 
mon  coupled  variable.  Thus,  by  scheduling  the  pairs  corresponding  to  the 
matching,  we  guarantee  that  none  of  them  share  a  coupled  variable  and 
therefore  none  cause  the  block  to  be  unsafe.  Thus,  algorithm  4.3  only  gen¬ 
erates  safe  blocks. 

We  can  now  prove  the  correctness  of  our  scheduling  algorithm. 

Theorem  -  Algorithm  4.4  only  generates  safe  schedules,  that  is,  schedules  that 
cause  algorithm  4.1  to  output  SAFE  and  halt. 

Proof: 

First,  we  show  that  algorithm  4.4  generates  schedules.  A  schedule  is  a  parti¬ 
tion  JJ  of  the  subterms  of  the  clause  head  t.  S  is  initially  the  set  of  sub¬ 
terms  (actually,  the  set  of  their  indices)  in  t.  single,  step  (described  in  algo¬ 
rithm  4.2)  schedules  a  subset  of  S  and  returns  the  remainder,  that  is,  after 
scheduling  JJ,  from  S,  returns  a  new  S=  s-n<  Algorithm  4.4  continues 
iterating  until  5  is  empty.  Since  single,  step  always  schedules  at  least  one 
pair  from  5,  algorithm  4.4  will  terminate.  JJ,  generated  by  algorithm  4.4  is 
a  partition  of  the  subterms  of  t  and  is  therefore  a  schedule. 

Since  single,  step  always  creates  a  safe  scheduling  block  (by  lemma  2),  each 
block  JJ,  in  /J  is  safe.  Algorithm  4.1  halts  and  outputs  SAFE  if  and  only  if 
all  blocks  of  JJ  are  safe,  so  it  does  so  on  all  schedules  generated  by  algorithm 
4.4. 


4.2.  Scheduling  with  Structures 

When  adding  structures  to  the  scheduling  problem,  a  number  of  complica¬ 
tions  appear.  First,  a  finer-grained  variety  of  dependency  analysis  is  needed. 
SDDA  as  we  have  seen  it  so  far  provides  information  on  the  coupling  relationships 
of  subterms  as  a  whole.  In  other  words,  it  might  indicate  that  the  second  sub¬ 
term,  say,  is  coupled  to  the  fourth.  However,  when  structures  are  introduced,  it 
may  be  the  case  that  the  first  element  of  the  second  subterm  is  coupled  to  the  first 
element  of  the  fourth  subterm.  If  other  elements  are  independent,  there  may  be 
opportunities  for  additional  parallelism.  We  must  design  a  way  to  express  this 
finer-grained  dependency  information  in  a  reasonable  notation  so  that  good 
schedules  may  be  derived. 

Secondly,  assuming  that  this  notation  exists,  rules  for  determining  the  vali¬ 
dity  of  a  schedule  must  be  developed.  A  number  of  problems  arise  here  which 
make  this  more  complicated  than  when  structures  are  not  considered.  For  exam¬ 
ple,  when  can  the  elements  of  a  structure  be  unified  simultaneously  with  the 
unification  of  the  structure’s  functor,  and  when  must  the  functor  be  unified  first? 
Figure  4.4  gives  a  simple  example  for  each  case. 


f(x), ... 

A 

(1) 

V 

f(g(A)) ... 


...,  f(g(_  1)),  ... 

1  W 

f(g(A)) .. 


Figure  4.4b)  unifying  fiinctor  first 


Another  related  problem  concerns  "hidden"  structures.  A  term  at  compile 
time  may  appear  textually  as  a  variable,  while  during  the  unification  it  may  take 
on  a  structure  value.  It  is  necessary  to  take  into  account  these  hidden  structures 
as  shown  in  figure  4.5. 

It  is  also  necessary  to  consider  special  rules  for  lists,  which  are  themselves  a 
special  form  of  structure. 


f(X,X,X) 


f(g(A),g(B),g(C)),  ... 

A 

f(g(A),g(A),g(A)) 


f(g(A),g(A),g(C)),  ... 
f(g(A),g(A),g(A)) 

Figure  4.S  -  Hidden  structure  elements  and  their  effect  on  scheduling 

The  third  complication  in  the  scheduling  problem  is  in  the  scheduling  algo¬ 
rithm.  Scheduling  involving  structures  and  lists  is  at  least  as  complex  as  schedul¬ 
ing  without  them.  New  heuristics  involving  lists  and  structures  must  be 
developed  and  incorporated  into  the  scheduling  algorithm. 

The  remainder  of  this  chapter  is  devoted  to  unification  scheduling  involving 
structures  and  lists.  The  first  part  revises  the  set  of  definitions  presented  earlier 
and  describes  the  expanded  mode  notation.  The  second  section  expands  the 
scheduling  rules  to  include  structures  and  presents  a  new  test  for  schedule  vali¬ 
dity,  and  the  third  section  presents  an  expended  unification  scheduling  algorithm. 

4.2.1.  Definitions 

The  chief  change  in  definitions  from  those  used  previously  is  that  of  the 
mode.  To  schedule  unifications  when  structures  are  included,  modes  are 


flattened  and  labeled.  As  an  example  of  flattening,  take  the  term  f(X,g(X,Y)). 
The  flattened  version  of  this  term  is  (X,g/2,X,Y).  (Remember  that  the  primary 
functor  is  unified  for  free,  and  therefore  does  not  need  to  be  considered.)  By 
flattening  structures  and  their  corresponding  modes,  all  subterms,  regardless  of 
depth,  become  easily  accessible  for  scheduling.  In  order  to  distinguish  individual 
subterms  (for  example,  the  two  X  subterms  in  the  above  flattened  structure),  and 
to  reconstruct  a  term  from  its  flattened  version,  the  subterms  are  labeled.  A  label 
is  a  tuple  indicating  the  position  that  the  subterm  inhabits  in  the  original  term.  If 
the  first  element  of  the  label  is  nv  then  its  corresponding  subterm  is  part  of  the 
n[h  subterm  of  the  given  term.  Likewise,  if  the  second  element  of  the  label  were 
n2,  the  corresponding  subterm  would  be  part  of  the  subterm  of  the  sub¬ 
term  of  the  original  term,  and  so  on.  For  example,  the  second  X  would  be  labeled 
(2,1).  Functor  subterms  are  considered  to  be  the  0th  element  of  a  subterm,  but 
for  convenience,  the  final  0  in  the  label  is  omitted.  The  flattened,  labeled  version 
of  f(X,g(X,Y))  would  be  ((X,(l)),  (g/2,(2)),  (X,(2,l)),  (Y,(2,2))). 

another  way  to  interpret  the  labels  is  as  the  path  that  must  be  taken  to  reach 
a  given  subterm  in  the  tree  representation  of  the  term,  where,  if  the  label  is 
(/ 1,  .  .  .  ,/n),  the  subterm  may  be  reached  by  first  visiting  the  l[h  son  of  the  root, 
and  then,  for  each  /,-  in  the  label,  visiting  the  ljh  son  of  the  node  labeled 
(/ 1,  .  .  .  |).  Figure  4.6  demonstrates  this  for  the  term  f(X,g(X,Y)). 


Figure  4.6  -  Labeling  a  term  tree 


The  formal  definitions  are  as  follows: 

Given  a  term  {=/({,,  .  . .  ,tn),  the  flattened  term  of  t  is  the  tuple 
(*1,1>  •  ,rn»  •  •  •  >*»,!»  •  •  •  where  (*.>  •  •  •  ,*»>,)  is  the  flattened  sub¬ 

term  of  <,•. 

Given  a  subterm  t=f(tv  the  flattened  subterm  of  t  is 

(//Mi,!,  •  •  •  where  (*,,!,  •  •  •  »  the  flattened 

subterm  of  t.  . 


For  a  constant  term  <=/,  its  flattened  term  is  the  zero-tuple  ().  For  a  con¬ 
stant  or  variable  subterm  t=f  or  t=X,  the  flattened  subterm  is  (/)  or  (X), 
respectively. 

A  A  A 

Given  a  term  t—f(t j,  and  its  flattened  term  t=(t .  .  .  ,fm),  the 

labeled,  flattened  subterm  of  t  is  f £,=((< ,,/,),  .  .  .  ,(tm,/m)),  where  /,  is  the  label 
of  subterm  The  subterms  of  two  flattened,  labeled  terms  are  corresponding 
subterms  if  they  have  identical  labels.  The  label  /,  of  a  subterm  t{  is  determined 
as  follows: 

Let  the  label  prefix  of  a  term  of  subterm  t  (to  be  explained  later)  be 
(/,,  .  .  .  ,/m).  If  the  ith  subterm  of  t  is  a  constant  or  variable,  then  its  label  is 
(/j,  .  .  .  ,/m,«).  If  the  ith  subterm  of  t,  lt-,  is  a  structure  f(ti  v  .  .  .  ,tiifl),  then  the 
label  of  the  functor  f/n  of  t{  is  (/  j,  .  .  .  ,/m,t)  and  the  label  prefix  of  the  sub- 
terms  ti  lf  .  .  .  ,ti  n  is  (/„  .  .  .  ,tm,i). 

The  label  prefix  of  the  main  term  is  the  zero-tuple  (). 

Given  a  flattened,  labeled  clause  head  fLs=((f|,/,),  .  .  .  ,(tk,/k)),  and  a 
flattened,  labeled  subgoal,  •  •  •  >(*V>V))  (note  that  it  is  now  possible 

that  kj^n),  the  entry  mode  is  C0=(GQ,HQ),  where  the  goal  entry  mode 
G0s((tH|V|'),  .  .  .  ,(mk',lk')).  and  the  head  entry  mode 
/f0=((m |,/i),  .  .  .  ,(mk,lk)).  Each  mode  element  m,-  or  m/  may  take  the  value 
denoting  a  functor  of  arity  r.  In  section  6.2,  we  will  show  how  static  data- 
dependency  analysis  can  be  improved  to  yield  these  entry  modes. 

Subsequent  modes  Ci—{Gi,Hi)  take  similar  form,  except  that  the  lengths  of 
G,-  and  //,  may  be  greater  than  n  and  k,  respectively,  due  to  the  addition  of  hid¬ 
den  subterms.  A  mode  element  mt  may  be  “hidden,”  in  which  case  it  appears 
as  'gH\  'iH\  'cf*'  or  'aH,/r.  A  hidden  subterm  is  one  that  did  not  appear  in  the 
source  text  of  the  original  pair  of  terms  being  unified,  but  appears  later  as  a  result 
of  subsequent  unifications.  Since  it  does  not  appear  textually  in  the  original 
source,  it  does  not  have  to  be  scheduled,  but  its  presence  may  give  additional 
information  on  the  coupling  of  subterms  and  will  therefore  have  an  influence  on 
scheduling. 

4.2.2.  Test  for  schedule  validity 

The  basic  principles  governing  schedule  validity  are  the  same  as  for  the  case 
where  structures  are  not  included.  Two  pairs  of  subterms  can  be  unified  if  and 
only  if  they  share  no  coupled  terms.  The  main  differences  are  that  structures 
must  be  incorporated  into  this  definition,  and  the  next.  C  function,  which  gen¬ 
erates  a  new  current  mode  from  the  previous  current  mode  and  schedule  block, 
must  be  scheduled  to  include  structures. 

A  few  examples  should  illustrate  the  rules  concerning  structure  unification. 
First,  we  consider  a  very  simple  example  with  no  coupled  variables: 

t'  =  f(A,B)  (subgoal) 

t  =  f(g(X,Y),Z)  (clause  head) 

The  labeled,  flattened  versions  of  the  terms  are: 
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preceding  page  was  missing  from  the  film 


Goal:  V  =  f(g(X,Y)) 

Head:  t  =  f(g(A,B)) 

tj!  =  ((g/2.(l)),  (X,(l,l)),  (Y,(l,2))) 
tL  =  ((g/2.(l)),  (A, (1,1)),  (B,(l,2))) 

Again,  assume  that  X  and  Y  are  independent  at  the  call.  Of  course,  A  and  B 
are  independent  of  each  other.  The  entry  modes  are: 

G0  =  ((s/2,(l)),  (i,(l,l)),  (i,(l,2))) 

H0  =  ((s/2,(l)),  (i,(l,l)),  (i,(  1,2))) 

All  three  elements  in  the  head  mode  H0  have  corresponding  elements  in  G0, 
so  all  are  candidates  for  parallel  unification.  In  addition,  none  of  the  three  pairs 
are  coupled  to  each  other  and  can  therefore  be  scheduled  to  be  unified  simultane¬ 
ously.  Thus,  the  schedule  for  this  unification  contains  a  single  block  JJ \  — 
{(1), (1,1), (1,2)}  and  the  final  mode  C1=(G1,H1)  is 

G,  =  ((s/2,(l)),  (e„(l,l)).  Ml, 2))) 

Hx  =  ((s/2,(l)),  (c„(l,l)),  (c2,(l,2))) 


Additional  considerations  must  be  made  when  a  structure  is  unified  with  a 
coupled  variable.  Consider  the  following  terms: 

Goal:  V  =  f(A,A) 

Head:  t  =  f(g(X,Y),Z) 

and  their  flattened,  labeled  versions: 

if. '  =  ((A,(l)),  (A, (2))) 

tL  =  ((g/2,(l)),  (X,(l,l)),  (Y,(l,2)),  (Z,(2))) 

Assuming  that  A  is  unbound  at  the  call,  the  entry  modes  are 

G0  =  ((c„(l)),  (c„(2))) 

H0  =  ((s/2,(l)),  (i,(l,l)),  (i,(  1 ,2)),  (i,(2))) 

Subterms  labeled  (1,1)  and  (1,2)  in  the  head  have  no  corresponding  subterms 
in  the  goal  and  are  therefore  not  candidates  for  unification  at  this  time.  Of  the 
remaining  two  subterms,  only  one  may  be  unified  because  the  corresponding  terms 
in  the  goal  are  coupled.  We  choose  subterm  (1).  The  resulting  mode 
CMGM  is: 

C,  =  ((s/2,(l)),  (<f  ,(1,1)),  (c?  ,(1,2)),  (s/2,(2)),  (<f  ,(2,1)),  (<f  ,(2,2))) 

H,  =  «s/2,(l)),  (i.u.l)),  (i,(l, 2)),  (1,(2))) 

Note  that  unifying  subterm  pair  (1)  caused  the  two  coupled  (ct)  mode  ele¬ 
ments  to  be  replaced  with  functor  and  hidden  argument  mode  elements.  Since 
subterm  (1)  in  the  goal  was  coupled  to  subterm  (2),  both  subterms  had  to  be 
replaced,  and  the  corresponding  arguments  were  coupled. 
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At  this  point,  we  can  either  schedule  pairs  (1,1)  and  (1,2),  or  schedule  pair  (2) 
alone.  We  cannot  schedule  (2)  along  with  (1,1)  or  (1,2)  because  subterm  (2)  in  the 
goal  has  two  hidden  arguments,  one  of  which  is  coupled  to  subterm  (1,1)  and  one 
to  (1,2).  When  unifying  a  functor  with  a  variable,  it  is  necessary  to  make  sure 
that  none  of  the  functor’s  arguments  are  coupled  with  anything  else  that  is  being 
coupled  at  the  same  time.  (Since  hidden  arguments  are  not  schedulable,  the 
unification  processor  unifies  all  hidden  arguments  when  their  non-hidden  functor 
is  unified.  Section  0  will  discuss  in  more  detail.)  After  unifying  pairs  (1,1)  and 
(1,2),  we  get: 

C2  =  ((s/2, (1)),  (cg,(l,l)),  (eg, (1,2)),  (s/2, (2)),  (eg, (2,1)),  (eg, (2, 2))) 

H2  =  «./2,(l)),  (eg, (1,1)),  leg, (1,2)),  (i,(2))) 

We  are  now  free  to  schedule  pair  (2): 

Gj  =  ((s/2, (D),  (eg, (1,1)),  (eg, (1,2)),  (s/2, (2)),  (eg, (2,1)),  (eg, (2, 2))) 

G,  =  ((s/2,(l)),  (eg, (1,1)),  (eg, (1,2)),  (s/2, (2)),  (eg, (2,1)),  (eg, (2, 2))) 

We  are  finished,  since  subterms  (2,1)  and  (2,2)  in  the  head  are  hidden  and 
need  not  be  scheduled. 

[Although  we  are  only  scheduling  those  subterms  which  appear  textually  in 
the  clause  head,  it  may  be  possible  to  use  hidden  structure  information  to  improve 
unification  performance.  This  will  be  touched  upon  in  section  9.4.] 

The  three  criteria,  then,  for  schedule  safety,  are  1)  presence  of  corresponding 
subterms,  2)  independence  of  simultaneously  scheduled  pairs,  and  3)  independence 
of  hidden  arguments.  All  three  criteria  must  be  addressed  in  algorithm  4.5. 

Algorithm  \.5  -  Extended  decision  procedure  for  schedule  safety 
Input: 

A  clause  head  t=f(tx>  .  .  .  ,tn)  and  a  call  subgoal  f'=/(<j',  .  .  .  ,tn'). 

A  flattened,  labeled  entry  mode  C0=(G0,//0)  containing: 

A  goal  entry  mode  G0  computed  by  SDDA  A  head  entry  mode  H0  com¬ 
puted  from  t. 

A  schedule  /7={/7„  .  .  .  ,JJm  }  for  parallel  unification  of  t  and  t'  which  is 
a  partition  of  the  set  of  labels  of  elements  of  H0. 

Output: 

SAFE  if  schedule  is  safe,  UNSAFE  otherwise. 

Algorithm: 


for  each  schedule  block  JJt-  from  fji  to  Tim' 
for  each  j^TIi  /*  correspondence  test  */ 
if  there  is  no  f/,_j  y€H,_i  or  G,_!  ;EG,_j 
l Hi,i  is  the  mode  element  in  Hi  with  label  j. 
Similarly  for  Git y.J 
output  ’UNSAFE’  and  halt, 
for  each  jEfJi 
if  there  exists  a  k  such  that 
Hi_i  k=8H /m  (for  some  m),  or 
(for  some  m),  or 
Hi- 1’, *=«„.  or 

//,  _I  k=9  y  or 

r  —aH/m  rH  nr  nH 
l*i— l,k~~ 8  /m<  cm>  *  )  or  9  » 

and  j  is  a  prefix  of  fc, 


if  there  exist  j,k£jji,(j^k)  such  that 
at  least  one  of  the  pairs 

is  (C|,C| )  for  some  /, 
output  ’UNSAFE’  and  halt, 
compute  Ci=next_C(Ci,fJi). 

/*  note  that  we  compute  new  mode  using  hidden  arguments, 
as  well  as  scheduled  subterms.  */ 


Function  77.) 

returns(C,) 


MK-i. 

for  each  jEjJ; 

cmgm 

return(Ci) 

Function  table(Hi,Gitj) 
returns  (//,,G,) 


look  up  entry  Hij/G{j  in  table  4.3 
and  modify  H{  and  G,-  according  to  instructions, 
return  new  (f/,,G  ) 
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Table  4.3  extended  unification  simulation  table. 


g 

G 

i 

i-hi 

ck 

s/m 

g 

a 

a 

b 

m 

i 

a 

c 

d 

h 

Cl 

e 

f 

g 

j 

s/n 

n 

i 

k 

1 

b) 


Note:  all  entries  not  mentioned  remain  unchanged.  Also,  if  an  element  of  //,•  or 
G$-  is  hidden  before  being  changed,  it  will  remain  hidden  afterwards. 

a)  H{j=G{i=g. 

if  augmented,  replace  all  ck  in  G  with  g. 
if  not  augmented,  /f,y=G,;=y. 

Hjj=Gi.=cp  for  some  new,  unique  p . 

Hij=Gi}=ck. 

if  augmented,  replace  all  c,  in  G  with  g. 
if  not  augmented,  Hij=Gij—g. 

H{j=Gi;=cl. 

replace  all  ck  in  G,  ,H{  with  c,. 


c) 

d) 

e) 


0 

g) 

b) 


i) 


Hij—Gij—afn 
let  /=(/„  ...  ,/p) 
for  A:  =  i  to  n 

add  H{  i  k)=*H  to  H% 

(Hi,Gi )— table (//,•  ,G,- ,(/ ,, .  . .  ,/_,*)) 


Hij=G{j=s/m 
let  /=(/„  ...  ,/p) 
for  A:  =  i  to  m 

add  Gi,[ib  . . .  W  to  Gi 

if  Hi  is  hidden 

{Hi,G  i)==table(Hi,G  i\l  , . /,,*)) 

/*  if  not  hidden,  we  schedule  them  explicitly 
and  will  have  to  examine  them  at  the  point  at  which 
they  are  scheduled.  */ 
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for  each  *GG,  and  such  that  k=ct 
k  =  s/m 

let  label(*)  =  ,lp ) 

for  q  =»  1  to  m 
if  *GG,- 

add  •  ■  •  ,/,.?)=ci>? t0  G« 

(where  p?  is  a  new,  unique  value) 
else  add  HikK . . .  ,<„?)=epH  to  G, 

let  ;=(/ 1, ...  ,/p) 
for  q  =  1  to  m 

(Hi,Gj)—table(Hi,Gi,(l v  . . .  ,lp,q )) 


for  each  z£G{  and  /Z,  such  that  z=et 
z  =  s/n 

let  label(z)  =  (/ j,  .  .  .  ,/p) 
for  q  =*  1  to  n 
if 

add  to  G* 
(where  p?  is  a  new,  unique  value) 
else  add  . . .  ,/„,)=<*«  to  G, 
let  /=(/|,  ...  ,/p) 
for  q  —  1  to  n 

»^^...,U)ishidden 

(/f„G,)=ta6/e(//„G, -,(/!,  .  .  .  ,/p,q)) 


let  ;=(/„  ...  ,/p) 
for  *  =  1  to  n 
if  Hi,(tv . . .  ,ip,k)  ^  hidden 
then  if  G,-  (/b . . .  j  ^  does  not  exist 

add  Gi,(h,  ■  .  ■  ,/„*)=*"  to  Gi 
/*  add  a  placeholder  */ 

(f/, ,G,- )=table (//, ,G, ,(/ j,  .  .  .  ,/p,*)) 


let  ;=(/„  ...  ,/p) 
for  *  =  1  to  m 

add  Gi\K  ■  •  •  , < t0 

/*  expand  ground  term  -  all  args  are  ground  */ 
[H{,  G,)=ta6/e(//„G„(/, - /„,*)) 


add  Hit w-g"  to  H{ 

/*  expand  ground  term  -  all  args  are  ground  */ 
(Hi&^tableiHi'GUlv  •  •  .  ,/,,*)) 

Note  that  the  table  function  is  recursive  in  order  to  simulate  unification 
between  (possibly)  nested  structures.  Also,  it  is  important  to  note  that  the 
“atomic”  unit  of  unification  scheduling  is  an  element  of  the  flattened  version  of 
the  clause  head.  If,  in  the  course  of  unification,  a  variable  (one  of  these  atomic 
units)  takes  on  a  structure  value,  the  arguments  of  that  structure  are  not  schedul- 
able.  They  are  represented  as  “hidden”  arguments  which  are  implicitly  unified  by 
the  unification  processor  which  unifies  the  value  associated  with  the  variable  and 
its  corresponding  subterm.  If  an  explicit  structure  appears  in  the  clause  head,  its 
functor  is  an  atomic  unit,  as  is  each  element  of  the  structure's  flattened  subterms. 
Each  of  these  elements  is  schedulable.  Since  the  arguments  are  schedulable,  they 
are  not  represented  by  hidden  arguments  to  be  explicitly  unified  when  their  func¬ 
tor  is  unified.  Rather,  the  validity  testing  algorithm  waits  until  they  appear  expli¬ 
citly  in  the  schedule  before  their  unification  is  simulated,  just  as  the  unification  of 
these  arguments  would  wait  until  they  explicitly  appeared  in  the  schedule. 

4.2.3.  Scheduling  algorithms 

In  this  section,  we  present  the  modifications  that  must  be  made  to  the  previ¬ 
ously  presented  scheduling  algorithms  so  that  they  may  handle  structures. 

The  modification  of  algorithm  4.2  is  straightforward,  and  involves  a  change 
in  the  notion  of  when  a  partition  block  is  safe.  In  addition  to  the  conditions  pre¬ 
viously  mentioned,  it  is  only  safe  to  add  a  the  pair  p  to  the  block  if  there  is 
an  element  in  Gj-i  corresponding  to  p  (i.e.,  if  exists)  and  if,  in  jj;,  p  is  a 

structure  with  hidden  subterms,  there  is  no  q  e77y,  or  no  r  which  is  a  subterm  of 
that  q,  which  is  coupled  to  any  of  p’s  hidden  subterms.  Likewise,  when  recomput¬ 
ing  subsequent  modes,  if  q  e/7,  for  some  subsequent  block  /J,-,  there  must  exist 
a  corresponding  Gi_Xq,  and  no  hidden  subterms  of  elements  in  JJi  may  be  cou¬ 
pled  to  other  elements  or  hidden  subterms  of  other  elements  of  of  JJi ■  If  this 
occurs,  the  block  is  unsafe  and  we  must  backtrack. 

In  extending  the  second  heuristic  algorithm,  which  attempts  to  find  the  max¬ 
imum  maximal  matching  of  the  coupling  graph,  some  complications  arise.  First, 
in  adding  edges  representing  unifications  to  the  coupling  graph,  only  those 
unifications  for  which  a  corresponding  element  in  the  current  goal  mode  exists 
may  be  considered.  Secondly,  since  structure  unifications  involving  hidden  sub- 
terms  may  involve  more  than  two  coupling  classes,  the  coupling  graph  becomes  a 
coupling  hypergraph.  If  p,  for  example,  is  a  subterm  pair  to  be  scheduled  in  a 
block  /J,  and  //,_ =  s/n  where  the  arguments  of  are  hidden,  and 

several  of  them  are  coupled  terms,  the  unification  will  interact  with  all  these 
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coupling  classes,  so  the  corresponding  “edge”  must  be  incident  on  all  of  these  cou¬ 
pling  class  nodes.  Thus,  we  have  a  hypergraph.  Figure  4.7  gives  an  example  of 
this.  Finding  the  maximum  maximal  matching  of  a  hypergraph  is  NP-complete, 
since  it  is  equivalent  to  the  n-dimensional  matching  problem  [15]  ,  but  for  the  size 
graphs  contemplated  there  exist  relatively  efficient  exponential  backtracking  algo¬ 
rithms  to  find  this  matching,  or  equivalently,  to  find  the  maximum  independent 
set  of  the  edge  graph  derived  from  the  hypergraph. 
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Figure  4.7  -  A  hyperedge  in  a  coupling  hypergraph 
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In  the  third  heuristic  algorithm,  we  must  determine  where  unifications  involv¬ 
ing  structures  fit  into  our  rankings.  Table  4.4  shows  how  these  unifications  are 
ranked. 

An  argument  of  a  structure  is  externally  coupled  if  it  is  coupled  to  a  term 
outside  the  structure. 

It  should  be  noted  that  structure  terms  are  ranked  similarly  to  other 
unification  pairs.  The  functor  itself  is  considered  to  be  a  ground  term.  If  there 
are  two  or  more  coupling  classes  associated  with  the  unification,  it  is  type-3.  If 
there  are  none,  or  none  externally  coupled,  it  is  independent.  If  there  is  exactly 
one,  then  the  unification  is  ranked  as  whatever  the  argument  unification  would  be 
ranked  if  it  were  not  a  structure. 

As  in  the  second  algorithm,  type-3  scheduling  now  involves  choosing  candi¬ 
date  subterms  from  those  for  which  corresponding  modes  exist  in  the  current  goal 
mode.  Additionally,  the  coupling  graph  is  also  a  hypergraph.  As  in  that  case,  the 
graph  can  be  transformed  to  an  edge  graph  and  the  largest  independent  set  found. 
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4.S.  Lists 

Up  to  now,  it  has  not  been  shown  how  lists  fit  into  the  scheme  presented 
here.  Lists  are  a  special  case  of  a  Prolog  structure.  As  in  LISP,  the  common  list 
notation  is  shorthand  for  a  more  cumbersome  car/cdr  structure  notation.  In  Pro¬ 
log,  the  list  functor  is  ./2,  where  the  first  argument  is  the  car  of  the  list  and  the 
second  argument  is  the  cdr.  Thus,  [1,2,3]  is  shorthand  to  .(1,. (2, .(3, nil))).  An 


V/V/vVvVaV/V/vV/ 


■  -vV 


Table  4.4  -  Table  of  unification  types 

Gi-i 


c 

i 

g 

s/m 

c 

3 

2 

1 

3 

i 

2 

i 

i 

i/S/3 

g 

1 

i 

i 

i/1/3 

s  (w/o  hidden  args) 

1 

i 

i 

i 

s  (w/hidden  args) 

3 

i/2/3 

i/1/3 

i/w/3 

i-  independent  pair 

1-  type-1  pair 

2-  type-2  pair 

3-  type-3  pair 

i/2/3- 

If  not  externally  coupled,  independent  pair.  Else  if  exactly  one  externally 
coupled  argument,  type-2  pair.  Else  type-3  pair. 

i/1/3* 

If  not  externally  coupled,  independent  pair.  Else  if  exactly  one  externally 
coupled  argument,  type-1  pair.  Else  type-3  pair. 

i/!/3- 

If  not  externally  coupled,  independent  pair.  Else  if  exactly  one  externally 
coupled  argument,  then  type-2  if  coupled  argument  corresponds  to  an  i  argu¬ 
ment,  otherwise  type-1.  Eke  type-3  pair. 

alternative  is  to  consider  lists  to  be  variable-arity  structures,  but  such  an 
approach  would  not  fit  in  well  with  the  generality  of  the  scheduling  scheme 
described  here,  or  with  static  data-dependency  analysis.  The  solution  to  be  used 
here  is  to  transform  all  lists  into  their  structure  form  before  SDDA  and  scheduling 
are  performed.  If  this  is  done,  no  additions  need  be  made  to  SDDA  or  the 
scheduling  algorithm. 

Some  Prolog  implementations  may  have  special  instructions  or  data  struc¬ 
tures  designed  to  increase  the  efficiency  of  list  handling.  In  some  cases,  it  may  be 
possible  to  recover  these  optimizations  by  use  of  implementation-dependent 
peephole  optimizations.  An  example  of  this,  for  the  parallel  version  of  the  Berke¬ 
ley  PLM,  will  be  presented  in  chapter  9. 

4.4.  Complexity  of  problem 

In  order  to  prove  that  unification  scheduling  is  NP-complete,  we  will  first 
prove  a  simple  special  case  that  does  not  satisfy  all  the  requirements  of  schedule 
safety  to  be  NP-complete.  This  special  case  will  be  called  "simple  unification 
scheduling”  and  differs  from  unification  scheduling  in  that  unifying  two  coupling 
classes  does  not  join  the  classes,  nor  does  unifying  a  ground  term  with  a  coupled 
term  cause  all  of  the  coupled  terms  to  become  ground. 


DEFINITION. 

A  schedule  JJ  is  considered  safe  for  simple  unification  scheduling  under 
entry  mode  C0=(G0,H0)  if,  for  each  JliEjJ  then  for  each 
none  of  the  following  are  true: 

i)  G0  j=ct  and  /fo  t=C|  (for  some  e() 

ii)  G<t,j=ci  and  Go,k=ci 
ui)  H0ij=ct  and  Ho  k=ct 
iv)  H o ,y=c/  and  Gok=ct 

In  other  words,  no  distinct  unifications  in  a  given  schedule  step  may  involve 
terms  in  the  same  coupling  class.  Note  that  nothing  is  said  here  about  joining  or 
grounding  coupling  classes.  Also  note  that  structure  subterms  are  not  considered. 
They  will  be  considered  at  the  end. 

DEFINITION. 

The  simple  unification  scheduling  problem  (SUS)  is  given  as  follows:  Given  an 
entry  mode  C0= {G0,H0)  and  a  schedule  size  D  £  Z+,  is  there  a  schedule  [J 
with  D  steps  which  is  safe  under  unification  scheduling? 

THEOREM : 

SUS  is  NP-complete. 

Proof: 

It  is  simple  to  show  that  SUS  is  in  NP.  Given  an  entry  mode  C0=(G0,i/0), 
create  a  possible  schedule  from  the  elements  of  the  head  entry  mode.  Test  this 
schedule  for  safety  under  simple  unification.  Assuming  that  the  head  entry  mode 
has  n  elements,  the  worst  case  would  be  that  in  which  the  schedule  had  exactly 
one  step  with  all  n  elements.  Checking  each  pair  would  take  time  0(n2).  The 
best  case  would  be  the  schedule  in  which  there  were  n  steps  of  one  element  each. 
Such  a  check  would  take  linear  time.  The  average  case  would  be  a  schedule  with 
y/n  steps  of  y/n  elements  each.  Such  a  schedule  would  take  time  0(n\/n )  to 
check.  In  any  case,  the  test  of  a  given  schedule  may  be  accomplished  in  polyno¬ 
mial  time. 

The  completeness  part  of  the  proof  may  be  demonstrated  through  a  reduc¬ 
tion  from  resource-constrained  scheduling  [15]  ,  which  can  be  formulated  as  fol¬ 
lows: 

Given  m  processors,  a  set  T  of  tasks,  each  of  length  l(t)  =  1,  r  resources, 
resource  bounds  Bt-  =  1,  and  resource  requirements  J?,(<),  such  that 
0<Ri{t)<Bi  for  each  task  t  and  resource  i,  and  where  each  task  uses  no 
more  than  2  resources,  and  an  overall  deadline  D  £  Z+ .  Is  there  a  m- 
processor  schedule  a  for  T  that  meets  D  and  obeys  the  resource  constraints? 

We  transform  the  problem  as  follows: 

i)  order  the  tasks  (arbitrarily)  from  tv  ...  ,tf. 

Create  a  head  and  goal  entry  mode  H0  and  G0,  respectively,  as  follows: 

ii)  For  each  resource  r;,  l<;'<r,  create  a  corresponding  coupling  class  c(ry)  as 
follows: 

if  fj-  is  only  used  by  one  task,  c(r;)  =  i. 


if  fy  is  used  by  more  than  one  task,  c(r;)  =  cy. 

iii)  For  each  task  /,•  from  tl  to  tT: 
if  ti  uses  no  resources, 

Goi=HQ{  =  i. 

if  tf  uses  exactly  one  resource,  ry, 

Go,i  =  i  Ho,i  =  c(fy) 
if  ^  uses  two  resources,  ry,rfc, 

Go,i  =  c(»-y)  H0ti  =  c(rfc) 

Since  the  use  bound  on  any  resource  is  1,  it  is  obvious  that  no  two  tasks  may 
attempt  to  use  a  common  resource  during  the  same  step  of  a  schedule.  A  one-to- 
one  correspondence  may  this  be  made  between  scheduled  tasks  and  scheduled 
unifications,  and  it  is  clear  that  a  task  schedule  a  of  length  D  obeying  the  resource 
bounds  if  and  only  if  a  safe  simple  unification  schedule  JJ  of  length  D  exists. 

Thus,  SUS  is  NP-complete. 


We  now  define  a  more  general  concept  of  unification  safety  that  takes  into 
account  the  joining  and  grounding  of  coupling  classes. 

DEFINITION: 

Given  a  schedule  JJ  and  an  entry  mode  Go,  two  coupling  classes  Cy  and  ck 
are  joined  at  schedule  step  JJ{  if  in  step  JJit  terms  in  coupling  class  Cy  or  a 
class  joined  to  Cy  at  JJ{,  and  a  term  in  coupling  class  ck  or  a  class  joined  to 
ek*JJi  are  unified. 

DEFINITION: 

Given  a  schedule  JJ  and  an  entry  mode  Got  a  coupling  class  Cy  is  grounded 
at  step  JJi  if,  in  step  JJit  a  term  in  coupling  class  Cy  or  some  other  class 
which  is  joined  with  Cy  at  JJ{  is  unified  with  either  a  ground  term  or  a  term 
in  a  class  which  is  grounded  at  step  JJ{. 

DEFINITION: 

A  schedule  JJ  is  considered  safe  for  general  unification  scheduling 
under  entry  mode  G0=(G0,f/0)  if,  for  each  JJ^JJ  then  for  each  j,k£jji 
(j^k),  none  of  the  following  are  true: 
i)  G0j=C(  and  H0  k=c(  (for  some  c/) 

“)  G0  y=c,  and  Gok=c( 
iii)  H0J—e,  and  Hok=c, 

»v)  Ho,j—ei  and  G0,*=e, 

unless  ck  is  grounded  one  of  the  r  steps  immediately  prior  to  JJit 
AND 

for  each  JJiGjJ,  for  each  j,k  €  JJi  {jy^k)  such  that  G0  j-  or  H0j=ch  and 
Gfy  or  H0>k=em  (for  some  l^m),  there  is  no  step  JJ  in  the  n  steps 
immediately  prior  to  JJ{  such  that  ck  and  c(  are  joined  in  JJp. 

The  above  definition  is  the  safety  criterion  expressed  in  section  4.1.2  when 
the  parameters  r  and  n  are  arbitrarily  large.  It  is  clear  that  simple  unification 


scheduling  is  the  special  case  where  r  and  n  are  both  equal  to  0. 

DEFINITION. 

The  general  unification  scheduling  problem  (GUS)  is  given  as  follows: 
Given  an  entry  mode  CT0=(G'0,//0),  a  schedule  size  D  6  Z+,  and  parameters 
n  and  r  to  the  general  unification  safety  definition,  is  there  a  schedule  JJ 
with  D  steps  which  is  safe  according  to  general  unification  scheduling  with 
parameters  n  and  r? 

THEOREM: 

GUS  is  NP-complete. 

Proof: 

1)  It  is  clear  that  GUS  is  in  NP,  since  any  possible  schedule  may  be  tested  for 
safety  in  polynomial  time  using  algorithm  4.1. 

2)  Since  SUS  is  a  special  case  of  GUS,  GUS  is  NP-complete. 

The  above  ignores  the  scheduling  of  structures.  Since  unification  without 
structures  is  a  special  case  of  unification  with  structures,  and  unification  with 
structures  is  in  NP  (using  the  polynomial  safety  test  of  algorithm  4.5),  GUS  with 
structures  is  also  NP-complete. 


6.  Models  of  Execution 

The  scheduling  scheme  described  in  chapter  4  may  be  reallized  in  a  number 
of  ways.  These  may  be  divided  into  two  broad  classes,  depending  on  whether  the 
scheduling  operation  is  performed  at  compile  time  or  at  run  time.  Each  scheme 
has  certain  advantages  and  each  provides  certain  tradeoffs.  In  addition,  special 
architectural  features  are  required  to  accommodate  the  various  schemes.  Those 
schemes  in  which  scheduling  is  performed  at  compile  time  are  known  as  static 
models  because  scheduling  is  based  on  the  static  source  representation  of  the  pro¬ 
gram  and  because  the  schedule,  once  determined,  is  not  altered  at  run  time. 
Schemes  in  which  scheduling  is  performed  at  run  time,  when  the  call  takes  place, 
are  known  as  dynamic  models,  since  the  schedule  is  based  on  the  actual  values  of 
the  variables  at  the  time  of  the  call.  Since  the  values  may  change  from  one 
instance  of  a  call  to  the  next,  the  unification  schedules  may  differ  each  time  a  call 
is  repeated. 

6.1.  Dynamic  Scheduling 

In  the  dynamic  scheduling  model,  the  schedule  is  determined  each  time  a  call 
is  performed.  Thus,  the  schedule  is  based  on  the  actual  values  of  the  variables  in 
a  call  rather  than  on  general  predicted  values  as  provided  by  SDDA.  Conse¬ 
quently,  there  is  a  greater  likelihood  of  finding  an  optimal  schedule  in  many  cases. 

In  a  typical  dynamic  scheme,  execution  of  a  call  involves  the  following  opera¬ 
tions: 

1)  The  entry  mode  of  the  calling  subgoal  is  computed. 

2)  The  head  of  the  called  clause  is  found  and  its  entry  mode  is  computed. 

3)  The  subterm  pairs  are  scheduled  for  unification.  The  scheduling  algorithm 

may  be  either  local  or  global  as  described  in  the  previous  chapter. 

4)  The  schedule  is  executed. 

There  are  many  optimizations  and  variations  on  this  general  procedure.  For 
example,  if  execution  backtracks  to  a  call  and  a  different  clause  may  be  called, 
execution  may  be  restarted  at  step  2  above,  since  the  values  of  the  subterms  in 
the  calling  subgoal  will  not  have  changed  and  the  subgoal’s  entry  mode  need  not 
be  recomputed.  The  entry  mode  of  the  called  clause  bead  may  be  different,  how¬ 
ever,  and  must  be  recomputed. 

Since  variables  in  the  clause  head  are  always  unbound  on  clause  entry  and 
the  head  entry  mode  may  be  determined  simply  by  examining  the  text  of  the 
clause  head,  head  entry  modes  do  not  vary  from  call  to  call  and  may  be  pre¬ 
computed  at  compile  time  and  stored  for  later  reference.  Thus,  step  2,  above, 
may  be  replaced  with  an  operation  which  looks  up  the  pre-computed  head  entry 
mode  of  the  called  clause. 

If  a  local  scheduling  algorithm  is  used,  the  scheduling  and  unification  opera¬ 
tions  of  steps  3  and  4  may  be  interleaved.  In  particular,  the  unification  simulation 
associated  with  the  next.  C  function  may  be  replaced  by  the  unification  itself, 
after  which  new  modes  may  be  recomputed.  In  other  words,  the  procedure  in 
figure  5.1a  may  be  replaced  by  that  in  figure  5.1b. 
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*) 

< given  G0,H0> 

-  schedule 

-  compute  GltH\  using  next.C 

-  schedule  JJ2 

-  compute  G2,Jf2  using  next.C 

-  schedule  JJn 

-  execute  JJi 

-  execute  ] J2 

-  execute  Tin 


b) 

< given  G0,H0> 

-  schedule  JJX 

-  execute  JJl 

-  compute  Gl,Hl  from  current  values  of  t,t' 

-  schedule  JJ2 

-  execute  fj2 

•  compute  G2,H2  from  current  values  of  t,V 

-  schedule  JJn 


Figure  5.1 

a)  non-interleaved  scheduling  and  execution 
b)  interleaved  scheduling  and  execution 


The  interleaved  execution  of  figure  5.1b  is  only  an  optimization  if  computing 
current  modes  from  the  actual  values  of  the  head  and  goal  is  faster  than  using 
next.  C.  The  process  can  be  definitely  sped  up,  however,  by  executing  a  schedule 
block  in  parallel  with  the  computation  of  the  next  mode,  using  next.  C,  since  these 
operations  may  be  done  in  parallel.  Interleaving  the  scheduling  and  unification  in 
this  way  can  reduce  the  number  of  scheduling  and  unification  steps  by  a  third. 

As  an  example  of  a  dynamically  scheduled  unification  using  the  local  max¬ 
imum  independent  set  heuristic,  consider  the  following  unification.  The  calling 
subgoal  is  f(A,B,C,D)  where  B  and  C  have  been  coupled  and  A  and  D  are  indepen¬ 
dent.  The  clause  head  is  f(X,X,Y,Y).  Renaming  the  variables  so  that  coupled 
variables  have  the  same  name  and  independent  variables  have  distinct  names,  we 
have 


goal:  f(.l,_2,_2,_3) 

head:  f(.4,.4,_5,.5) 
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In  step  1,  the  goal  entry  mode  G0  is  computed.  It  is  (*  ,c1,c1,t ). 

In  step  2,  the  head  entry  mode  H0  is  computed.  It  is  (c2,c2,c3,e3).  As  men¬ 
tioned  previously,  this  can  be  pre-computed  at  compile  time. 

In  the  interleaved  scheduling/unification  phase,  we  must  first  find  a  max¬ 
imum  independent  set.  One  such  set  consists  of  the  first  and  third  subterm  pairs. 
These  are  scheduled  for  the  first  block  and  immediately  executed.  In  parallel  with 
the  unification,  the  goal  and  head  modes  are  recomputed.  The  result  is 

goal:  f(_  1,.2,_2,_3)  goal  mode:  (c2,Ci,Ci,i) 
head:  f(.  1,.  1,_2,_2)  head  mode:  (c2»c2»ci«ci) 
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Of  the  remaining  unscheduled  subterm  pairs,  either  the  second  or  the  fourth 
alone  form  a  maximum  independent  set.  We  choose  the  second,  execute  it,  and 
recompute  the  modes  and  get 

goal:  f(_l,„l,_l,_3)  goal  mode:  (c2,c2,c2,i) 
head:  f(_  1,.  1,.  1,_  1)  head  mode:  (c2,c2,e2,c2) 

Finally  we  choose  the  remaining  fourth  subterm  pair,  execute  its  unification, 
and  are  completed. 

An  architecture  which  executes  such  a  dynamic  scheme  needs  a  number  of 
special  hardware  features  (figure  5.2).  First,  the  architecture  needs  to  be  able  to 
compute  the  entry  mode  of  a  call  subgoal.  Secondly,  there  needs  to  be  some  sort 
of  hardware  scheduling  unit.  In  the  above  case,  this  unit  can  choose  a  maximum 
independent  set,  but  it  may  be  a  unit  implementing  any  scheduling  algorithm.  In 
the  above  example,  the  hardware  needs  to  be  capable  of  computing  new  modes 
from  the  old  modes  and  the  last  schedule  block,  but  we  have  shown  that  a  unit 
which  can  compute  modes  from  the  terms  resulting  from  execution  of  a  schedule 
block  may  be  used  instead,  although  one  may  have  to  pay  a  price  in  efficiency. 
Finally,  the  architecture  needs  to  have  a  number  of  homogeneous  unification  units 
and  some  method  of  assigning  unification  operations  to  them. 

In  figure  5.2,  the  mode  memory  stores  the  current  modes  and  the  term 
memory  stores  the  current  values  of  the  terms.  The  mode  computer  calculates 
new  modes,  and  the  scheduler  schedules  the  subterms  for  unification.  The 
dispatcher  and  unification  units  actually  perform  the  unifications.  All  of  these 
capabilities  may  be  implemented  in  microcode  or  directly  in  hardware. 

The  advantage  of  dynamic  scheduling  is  that  precise  data  dependency  infor¬ 
mation  may  be  used  for  scheduling.  As  we  shall  see,  SDDA,  used  in  static 
scheduling,  generates  worst-case  information.  For  example,  if  SDDA  indicates 
that  two  subterms  are  coupled,  this  really  means  that  they  may  sometimes  be  cou¬ 
pled.  Since  scheduling  is  based  on  these  computed  entry  modes,  the  two  subterms 
will  not  be  unified  in  parallel  even  if,  on  some  occasion,  they  happen  to  be 
independent.  In  dynamic  scheduling,  since  we  do  scheduling  from  the  actual 
values  of  subterms,  the  coupling  information  is  precise.  However,  we  shall  see  in 
the  next  chapter  that,  in  practice,  the  worst-case  information  is  generally  very 
close  to  the  precise  information,  since  any  given  Prolog  procedure  is  likely  to  be 
called  in  only  a  few  different  ways,  and  that  even  where  there  is  a  difference 


Figure  5.2  -  Dynamic  scheduling  hardware  organization 


between  worst-case  and  precise  information,  that  difference  can  be  minimized  or 
even  eliminated  by  use  of  procedure  splitting. 

In  addition  to  having  one  not-very-decisive  advantage  over  static  scheduling, 
dynamic  scheduling  has  a  number  of  disadvantages.  First  is  the  hardware  over¬ 
head  necessary  to  implement  dynamic  scheduling.  As  shown  in  the  next  section, 
static  scheduling  requires  substantially  less  hardware.  Second  is  the  time  over¬ 
head  needed  to  compute  modes  and  schedules.  Static  scheduling  also  needs  to 
compute  these,  but  it  is  all  done  before  the  program  is  run,  while  dynamic 
scheduling  takes  time  during  the  actual  running  of  the  program.  A  dynamically 
scheduled  program  would  run  more  slowly  than  the  same  statically  scheduled  pro¬ 
gram,  although  it  would  take  more  time  to  compile  a  statically  scheduled  pro¬ 
gram.  A  statically  scheduled  program  would  presumably  be  compiled  only  once, 
however. 
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A  third  disadvantage  of  the  dynamic  model  presented  here  is  that  the  same 
modes  and  schedules  would  have  to  be  computed  each  time  a  call  is  made  with 
the  same  entry  modes.  This  redundant  work  can  be  avoided  if  modes  and  their 
associated  schedules  are  cached.  The  cache  would  be  an  associative  memory 
whose  key  is  the  entry  mode  and  which  would  also  contain  the  schedule  derived 
from  that  entry  mode.  Upon  executing  a  call,  the  entry  mode  would  be  computed 
and  looked  up  in  the  cache.  If  the  mode  is  found  in  the  cache,  the  associated 
schedule  is  used.  Otherwise  the  schedule  is  computed  and  added  to  the  cache. 
Such  a  scheme  would  speed  up  dynamic  scheduling,  but  would  require  a  further 
hardware  overhead.  Such  a  cache  might  have  to  be  quite  large. 

6.2.  Static  Scheduling 

Chapters  3  and  4  have  given  the  details  of  static  scheduling.  Basically,  static 
scheduling  involves  determination  of  entry  modes  and  schedules  at  compile  time, 
and  execution  uf  these  previously  computed  schedules  at  run  time.  The  schedules 
are  included  as  part  of  the  compiled  code.  Unlike  dynamic  scheduling,  static 
scheduling  requires  a  simpler  architectural  extension  (figure  5.3).  All  that  is 
needed  is  a  set  of  homogeneous  unification  units  and  a  dispatcher  to  assign  them 
unification  operations.  A  synchronization  mechanism  is  needed  «o  insure  that 
unifications  in  one  scheduling  block  are  not  started  before  unifications  in  the  pre¬ 
vious  one  are  completed.  This  may  either  be  a  fork/join  mechanism  or  simply 
having  the  unification  units  operate  in  lockstep.  The  latter  approach  will  be  used 
here  since  it  appears  to  be  no  less  powerful  than  fork/join  and  requires  a  simpler 
control  mechanism. 

Unlike  dynamic  scheduling,  static  scheduling  also  requires  a  substantial 
software  system  to  extract  maximum  parallelism  from  a  Prolog  program’s 
unifications. 


Figure  5.3  -  Statically  scheduled  unification  hardware 


The  disadvantages  of  the  dynamic  scheme  are  the  advantages  of  the  static 
scheme.  There  is  less  hardware  overhead  or  scheduling  overhead  at  run  time. 


Instead,  all  analysis  and  scheduling  is  done  once,  at  compile  time,  incorporated 
into  the  compiled  code,  and  is  available  for  repeated  executions  of  the  program 
without  being  recomputed. 

It  appears  that  static  scheduling  has  a  number  of  advantages  over  dynamic 
scheduling.  Because  of  this,  we  will  not  consider  dynamic  scheduling  any  further, 
but  will  instead  concentrate  on  implementing  static  scheduling. 


6.  Static  Data-Dependency  Analysis 

Static  data-dependency  analysis  (SDDA)  is  a  pre-compile  time  technique 
developed  by  J-H  Chang  (2]  for  determining  the  coupling  relationships  between 
variables  and  for  computing  entry  and  exit  modes  for  clause  heads  and  subgoals. 
In  addition  to  its  use  in  gathering  information  for  unification  scheduling,  it  is  also 
used  for  scheduling  AND-parallelism  and  computing  intelligent  backtracking  des¬ 
tinations.  Chang  provides  a  description  of  SDDA  in  his  dissertation.  Unfor¬ 
tunately,  that  description  contains  a  number  of  errors  and  omissions,  as  well  as  an 
unrigorous  and  non-algorithmic  presentation  of  the  technique.  The  next  section 
(6.1)  presents  a  correct  and  algorithmic  description  of  SDDA  and  should  be  con¬ 
sidered  to  supersede  the  corresponding  chapter  in  [2]  .  The  following  section  (6.2) 
presents  a  number  of  enhancements  to  Chang’s  technique  which  may  be  used  to 
improve  the  data  yielded  by  SDDA  and  create  more  efficient  unification  schedules. 

6.1.  Description 

SDDA  is  a  recursive  technique  by  which  the  information  contained  in  one  or 
more  query  entry  modes  (i.e.,  modes  representing  the  relationships  among 
terms  in  a  top  level  query)  are  propagated  through  a  Prolog  program  so  that 
entry  and  exit  modes  for  each  subgoal  and  clause  are  computed.  These  nodes  are 
identical  to  those  previously  described,  that  is,  an  n-ary  predicate’s  mode  is  an  n- 
tuple  (m,,  .  .  .  ,mn)  where  each  m,  is  either  g  (a  ground  term),  i  (an  independent 
term),  or  Cy  (a  coupled  term).  (At  this  point,  we  are  not  considering  structure 
modes.  They  will  be  considered  in  section  6.2.) 

Intuitively,  mode  information  is  propagated  by  considering  in  turn  each  can¬ 
didate  clause  of  the  called  procedure.  For  each  subgoal  of  the  clause,  an  entry 
mode  is  computed  and  the  called  procedure  is  similarly  examined.  When  all 
subgoals  in  the  clause  have  been  examined,  an  exit  mode  for  the  clause  is  com¬ 
puted  which  is  passed  back  to  the  calling  subgoal.  In  order  to  allow  a  more 
efficient  analysis  and  avoid  infinite  recursion,  the  notion  of  “better”  and  “worse” 
modes  has  been  created.  (These  will  be  defined  shortly.)  If  a  procedure  is  about 
to  be  examined,  and  it  has  already  been  examined  with  a  worse  entry  mode,  the 
new  examination  is  abandoned.  Also,  a  list  of  all  clauses  currently  “active”  (i.e., 
being  examined)  is  maintained.  If  a  clause  is  examined  which  is  currently  active, 
and  at  least  one  of  the  activations  has  an  entry  mode  worse  than  that  of  the 
current  clause,  the  new  examination  of  that  clause  is  abandoned.  In  this  way, 
infinite  recursion  is  avoided. 

When  the  algorithm  is  completed,  each  clause  and  subgoal  in  the  program 
which  is  potentially  reachable  through  the  top-level  query  or  queries  will  possess 
an  entry  and  exit  mode.  These  represent  the  relationships  between  subterms  in 
the  predicate  (clause  head  or  subgoal)  just  before  and  after  the  predicate  is  exe¬ 
cuted,  respectively.  The  framework  of  the  algorithm  (6.1)  is  presented  below;  the 
gaps  will  be  filled  in  later.  The  reader  should  assume  the  existence  of  two  tables 
maintained  by  the  program  and  initially  empty.  One  is  a  table  of  procedures  and 
their  “worst-case”  entry  and  exit  modes  (i.e.,  the  worst  entry  and  exit  modes  with 
which  they  have  been  examined).  The  second  is  a  table  of  the  currently  active 
clauses  and  their  entry  modes.  Clauses  may  be  identified  using  the  procedure 


name  and  an  index.  Table  management  is  a  simple  task  in  Prolog;  entries  may  be 
added  and  deleted  using  assert  and  retract,  respectively,  and  lookups  are  simple 
Prolog  calls. 
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Algorithm  6.1  •  Static  Data  Dependency  Analysis 


Input: 

Output: 

Algorithm: 


query  entry  mode 

query  exit  mode,  and  entry  and  exit  modes  for  each  reachable  clause. 


A 


main:  query,  exit,  mode  =  sdda(query.  entry,  mode); 
sdda(current.  entry,  mode): 

if  worst-case  entry  mode  for  the  procedure  exists  /*  from  table  */ 
and  current  entry  mode  is  better  than  the  worst  case  entry  mode, 
and  a  worst-case  exit  mode  exists  for  the  procedure 
then 

exit  mode  =  worst-case  entry  mode  for  that  procedure; 
return  exit  mode; 
else 

worst-case  entry  mode  =  owcg(current  entry  mode, 
worst-case  entry  mode  for  procedure); 

/*  owcg  =  optimal  worst-case  generalization  -  see  below  */ 
current  entry  mode  =  worst-case  entry  mode; 

replace  old  worst-case  entry  mode  for  procedure  in  table  with  new  one; 

for  each  candidate  clause  in  called  procedure: 
if  current  clause  is  active  with  an  entry  mode  equal  to  or 
worse  than  the  current  entry  mode 
then 

go  back  to  top  of  loop  and  try  next  clause; 
else 

add  current  clause  and  entry  mode  to  activation  table; 
create  variable  status  V0  from  the  clause  head  and  current  entry  mode; 
for  each  subgoal  i  from  1  to  n  /*  n  =  number  of  subgoals  in  clause  */ 
if  subgoal  i  is  unify  goal 
then  generate  V,-  from  and  subgoal; 
else  if  subgoal  i  is  call 
then 

create  a  subgoal  entry  mode  for  the  subgoal  using 
subgoal  exit  mode  =  sdda(subgoal  entry  mode); 
create  l'-  from  subgoal  exit  mode  and  V,_,; 
else  Vi=V^_,;  /*  for  other  calls  */ 
create  current  exit  mode  from  Vn; 

worst-case  exit  mode  =  owcg(  previous  worst-case  exit  mode  for 
procedure  from  table,  current  exit  mode); 
replace  old  worst-case  exit  mode  for  procedure  in  table  with  new  one; 
remove  current  clause  and  entry  mode  from  activation  table; 
return  worst-case  exit  mode; 
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At  this  point,  there  are  a  number  of  issues  which  need  to  be  resolved: 

•  Defining  the  worse/better  relationship  on  modes. 

•  Defining  the  optimal  worst-case  generalization. 

•  Defining  the  variable  status  Vj-. 

•  Generating  V0  from  the  entry  mode  and  the  clause  head. 

•  Generating  a  subgoal  entry  mode  from  the  subgoal  and  the  previous  variable 
status. 

•  Generating  from  a  subgoal  exit  mode  and  V^_j. 

•  Generating  a  clause  exit  mode  from  Vn  and  the  clause  head. 

•  Generating  V{  from  V{_t  and  a  unify  goal. 

A  mode  is  considered  worse  than  another  mode  if  the  first  mode  is  more 
general.  In  other  words,  if  the  first  mode  will  allow  creation  of  a  schedule  which 
is  safe  (although  non-optimal)  from  the  second,  we  say  that  the  first  mode  is  worse 
than  the  second. 

A  number  of  preliminary  definitions  must  be  presented  before  the 
better/worse  relationship  is  defined. 

DEFINITION: 

Let  A/=(m,,  .  .  .  ,mn)  and  .  .  .  ,mn')  be  two  entry  modes  for  the 

same  procedure,  and  c}  and  ck  be  mode  elements  of  M  and  M\  respectively, 
representing  coupled  terms,  cy  covers  ck  if  and  only  if  for  each  »  such  that 
mi'=ck,  m^Cj. 

In  other  words,  Cy  covers  ek  if  {*  I  m/=ct  }C{i  |  m,=cfc}. 

A  better/worse  partial  order  may  be  said  to  hold  on  corresponding  individual 
mode  elements  of  M  and  M1  (i.e.,  m,-  and  m,-'  for  any  i).  (Notation:  a  >  b  means 
“a  is  worse  than  b,”  or  “b  is  better  than  a,”  a  =  b  means  “a  equals  b,”  and  a 
<>  b  means  “there  is  no  relationship  between  a  and  b.”)  The  following  order¬ 
ings  hold: 


»>g 

cy>g 

Cy>i 

Cy>Ct 

i  =  i 

g  =  g 

e}=ek 


(for  any  j) 

iff  Cy  covers  ek  and  ek  does  not  cover  Cy 


iff  c;  covers  ek  and  ek  covers  c;- 


Cy<>cfc  iff  Cy  does  not  cover  ek  and  ek  does  not  cover  ca¬ 
using  the  above  relation,  we  can  define  a  similar  better/worse  partial  order 
for  entire  modes: 

DEFINITION: 

Let  M=(mj,  .  .  .  ,mn)  and  M'=(m  j',  .  .  .  ,mn')  be  two  modes  for  the  same 
procedure.  Then  M  is  worse  than  M'  if  and  only  if  there  exists  an 

such  that  m,- >mf-'  and  for  all  other  75^1  ,m;>my'  or 
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m,=m,'. 

Likewise,  M—M1  if  and  only  if  for  all  i,  l<i<n,  m,=m,'.  If  neither 

M>M\M1>M  ,  nor  M=M'  holds,  then  MOM1. 

In  algorithm  6.1,  there  are  situations  in  which  it  is  necessary  to  find  a  mode 
which  is  worse  than  or  equal  to  both  of  two  other  modes.  Such  a  mode  is  called  a 
worst-case  generalization.  For  example,  if  a  clause  is  called  with  two  modes  M 
and  AT,  a  worst-case  generalization  M"  can  be  found  from  which  a  schedule  can 
be  derived  which  is  safe  for  both  M  and  M1.  We  would  like  M"  to  be  as  “good” 
as  possible  so  that  the  maximum  potential  parallelism  will  be  available  in  M".  By 
“as  good  as  possible”  we  mean  that  M" >  M, Mn >  M',  and  for  all  A/'"  such  that 
Mm>M  and  Mm>Ml,  it  must  also  hold  that  That  is,  M"  is  the  best 

mode  that  is  worse  than  or  equal  to  the  two  modes  it  is  generalizing.  We  call  Af" 
the  optimal  worst-case  generalization  (owcg)  of  M  and  A/'. 

Let  Af=(m j, . . .  ,mn)  and  A/'=(m . . .  ,mn')  be  modes.  Then  the 
optimal  worst-case  generalization  owcg{M  ,M1)=Mn={m  XH , .  .  .  ,mn")  is  com¬ 
puted  as  follows: 

for  each  j  from  1  to  n 
if  rtij  =  g  and  m-  =  g,  then  m"  =  g 
if  m-  —  g  and  my'  =  i,  then  mf  =  i 
if  my  =  i  and  my'  =  i,  then  my*  =  i 
if  my=c,  and  mj'—ck,  then  m"=ct  where 
C|  covers  both  and  c* 
if  my  3=  i  or  g  and  my' =ek, 
or  m;=ct  and  my'  =  i  or  g, 
then  mywa=c/  where  et  covers  ck. 

Table  6.1  gives  some  examples  of  optimal  worst-case  generalizations. 


Table  6.1 

-  Optimal  worst-case  generalisations 

M 

W - 

w - 

(g,i) 

(i.g) 

(M) 

(g.i) 

(c,,ci,t ) 

(«,C2,C2) 

(C3>c3.cs) 

It  can  be  shown  that  M”  is  the  optimal  wcg.  Let  M"=owcg{  M,M')  be  com¬ 
puted  by  the  above  algorithm.  We  examine  three  cases:  M>  M',M=M',  and 
MOM1. 

1)  (M—M')  If  M=M\  then  M"=M  and  M"=M'  as  follows: 

for  each  j,l<7<n,my=my',  which  means  that 

•  m y  =  g  and  my'  =  g,  in  which  case  m"  =  g. 

•  my  s=  i  and  my'  =  i,  in  which  case  m”  —  i. 

•  my=c,  and  my'=c*,  where  c,=cA,  in  which  case  m"=cl  where  c/=c, 
and  ct=ck  (et  covering  both  and  cfc) 

Thus,  A/"a=A/'  and  M"=M.  For  any  Mm>M  or  M',  it  must  therefore  hold 
that  Mm>Mn. 
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2)  (M>M1)  Let  M"=otvcg(M ,Mprime)  and  M>M'.  Then  for  each 
y,l<;<n,  one  of  the  following  two  possibilities  holds: 

•  in  which  case  m"=mj. 

•  mj=mj',  in  which  case  m"=mj. 

Thus,  M"=M,  and  any  M™>M  must  also  be  >Mn. 

3)  (M OM1)  Let  M"=owcg{M,Mprime)  and  A/OM*.  Then  for  each 

one  of  the  following  possibilities  holds: 

•  my>m/,  in  which  case  m^—rrij. 

•  in  which  case  m”=mf. 

•  mj—m-,  in  which  case  my*=my.  (wlog) 

•  rrijOm f  (only  true  when  m;=c,,  mj—ck,  and  c,<>ct),  in  which 
case  m"=Ci  where  Cj  covers  both  ci  and  ck.  Thus,  myw>my  and 

Thus,  by  the  definition  of  the  >  ordering,  M">M  and 
To  show  that  Mn  is  optimal,  let  .  .  .  ,mnm )  be  some  mode  such 

that  and  By  definition  of  >,  there  exists  a 

k,l<k<n,  such  that  mk">mkm.  Examining  the  corresponding  elements  of  M 
and  A/', 

•  if  mk>mk,  then  mk"=mk  by  definition  of  owcg.  Therefore  mk>mkm, 
which  implies  that  Mm>M  does  not  hold;  a  contradiction. 

•  If  mkt>mk,  then  mk”=mkl.  This  means  that  mk>mkm,  implying  that 
Mm>M'  does  not  hold;  again  a  contradiction. 

•  If  mk=mk,  then  mk"=mk  (wlog).  This,  too,  leads  to  a  contradiction. 

•  If  mk  <>mk',  then  mkn=ci  such  that  c{  covers  mk  and  mk'.  Assume  (wlog) 

that  mkm=Cj  for  some  j.  Since  mk">mkm,  c,-  must  cover  cy,  but  not  vice 
versa.  Since  c,-  covers  mk  and  mk  and  cy  does  not  cover  c,-,  Cy  must  not 
cover  either  mk  or  mk.  Therefore,  either  Mm>M  or  does  not 

hold;  again  a  contradiction. 

Therefore,  if  Af"=owcg(M ,M*),  there  can  be  no  mode  AT"  such  that 
Mm>M,  M'">  M',  and  and  M"  is  the  optimal  worst-case  generaliza¬ 

tion. 


Given  a  clause 

h  :-  g  j,  .  .  •  ,gn 

the  variable  status  represents  the  coupling  relationships  among  the  variables 
occurring  in  the  clause  up  to  and  including  subgoal  V,-  is  a  triple  (<?<,/,•, Ct) 
where  G,-  is  the  set  of  all  variables  which  are  ground  after  returning  from  subgoal 
gi}  /,  is  the  set  of  all  variables  which  are  independent  at  that  point,  and  C,-  is  the 
partition  of  all  coupled  variables  into  coupling  classes.  V0  represents  the  variable 
status  after  the  clause  head  and  before  the  first  subgoal. 


Variable  status  triples  are  constructed  from  the  previous  variable  status  triple 
(i.e.,  V{  from  Vj_,)  and  the  exit  mode  of  subgoal  3,.  V0  may  be  computed  from 
the  clause  entry  mode  and  the  clause  head.  The  entry  mode  for  a  subgoal  &  can 
be  computed  from  V^_j  and  the  text  of  the  subgoal  itself.  Finally,  the  clause  exit 
mode  may  be  computed  from  the  clause  head  and  Vn .  The  remainder  of  the  sec¬ 
tion  shows  how  variable  status  triples  and  subgoal  entry  modes  are  generated. 

Execution  of  a  subgoal  may  be  considered  to  transform  the  variable  status 
which  existed  before  the  subgoal  was  executed  to  that  which  exists  after  the 
subgoal  is  executed.  In  other  words,  V,-  is  a  function  of  a  subgoal  </,•  (the  ith 
subgoal  in  a  clause),  its  exit  mode  M,-,  and  the  previous  variable  status  Vf_|.  The 
basic  principles  behind  generating  a  new  V,-  are  the  following: 

•  if  two  variables  in  different  coupling  classes  in  V^_|  are  coupled  in  the  exit 
mode  Mit  the  coupling  classes  are  joined  in  V^-. 

•  if  a  variable  is  ground  according  to  M,  ,  it  becomes  ground  in  Vit  regardless 
of  what  it  was  in  V*_|. 

•  a  variable  is  independent  in  Vj-  if  and  only  if  it  is  not  ground,  and  either  it 
was  independent  in  and  nothing  has  been  done  to  change  that  in  Mt- 
(such  as  binding  it  to  a  coupled  term),  or  it  is  a  member  of  a  singleton  cou¬ 
pling  class. 

•  all  variables  in  a  ground  term  are  ground. 

•  all  variables  in  an  independent  term  must  be  assumed  to  be  coupled  to  each 
other.  We  must  assume  this  because  it  is  the  worst  case.  Likewise,  we  must 
assume  this  for  coupled  terms. 

An  algorithm  for  generating  V{  is  presented  as  algorithm  6.2. 

Generating  V0  from  the  clause  head  is  a  special  case  of  algorithm  6.2.  For 
the  input  variable  status  V_j,  we  use  the  empty  status  G'_j=/_1=C_,=0, 
instead  of  the  subgoal,  we  use  the  clause  head,  and  instead  of  the  subgoal  exit 
mode,  we  use  the  clause  entry  mode.  The  algorithm  will  yield  V0. 


I 


Algorithm  6.2  •  Generating  Vi  from  V,_j,  etc. 

Input:  Previous  variable  status  V^_1=(G,_|,/t_1,Cl_1), 
subgoal  /=/(/ 1, . .  .  ,/n),  and  the  subgoal’s  exit  mode  M=(m v 

Output:  New  variable  status  V}. 

Algorithm: 

K=K-i; 

for  each  c,-  which  appears  as  at  least  one  element  in  M: 
ccl  =  0; 

for  each  mk  such  that  mk=cJ-: 

ccl  =  ccl  (J  {all  variables  in  tk) 
for  each  c.  class  in  C;: 
if  ccl  n  0 
then 

ccl  =  ccl  U  c_  class 
C,  =  C,  -  {c.  class} 

C{  =  G,  U  {ccl} 
for  each  i  from  1  to  n 
if  m,  =  g 
then 

Gt-  =  G,  U  {all  variables  in  *,} 

Wr-G* 

remove  any  variables  in  from  all  coupling  classes  in  G,- 
else  if  m,  =  i 
then 

ccl  =  {all  variables  in  <,  } 
for  each  c_  class  in  G,- 
if  ccl  D  c.  class  7^  0 
then 

ccl  =  ccl  U  c.  class 
G,-  =  Cf  -  {c.  class} 

G,-  =  Gt-  U  {c.  class} 
for  each  c.  class  in  G,-  /*  clean  up  G,-  */ 
if  I  c.  class  I  =  1 
then 

/,•  =  /,•  U  c.  class 
Gj  =  G<  -  (c.  class} 
else  if  c.  class  ft  ^  0 
then  /j  =  /,•  -  c.  class 


unnecessary  worst-case  results.  Instead,  a  more  direct  approach  will  be  used. 
When  A  and  B  are  simple  variables,  table  6.1  indicates  the  appropriate  action  to 
take  in  transforming  Vi_l  into 

Table  6.2 

Unify  subgoal  transformations  on 


Be 

or  not  in  V^_I  c.  class  in  C^i 


a 

b 

e 

A  €  /,•_!  or  not  in  Vi_l 

c 

d 

g 

c.  class  in  Ci_l 

f 

h 

i 

First,  then 

a)  No  action 

b)  <?,=<?,  U  {B},  /<=/,  -  {B} 

c)  U  {A},  /,=/,  -  {A} 

d)  Create  new  coupling  class  {A,B},  add  to  C,-.  Remove  A  and/or  B  from 

e)  Add  B  to  Git  remove  B  from  coupling  class  in  Ct-.  If  coupling  class  now  sin¬ 
gleton,  move  remaining  element  to  Remove  coupling  class. 

f)  Same  as  e),  but  use  A  instead  of  B. 

g)  Add  A  to  B’s  coupling  class.  Remove  A  from  /,,  if  necessary. 

h)  Same  as  g),  but  exchange  A  and  B  in  above  definition. 

i)  Combine  A  and  B’s  coupling  classes  in  C*. 

More  complex  unify  goals  may  be  transformed  into  this  above  simple  one.  If 
A  and  B  are  both  structures,  e.g.,  f(At, .  . .  ,An)  =  f (Bv  .  .  .  ,Bn),  they  may  be 
replaced  with  a  series  of  simpler  unifications,  e.g.,  Ai=Bv  .  .  .  ,An=Bn. 

If  only  A  is  a  structure,  e.g.,  f(A1; . . .  ,A„)  =  B,  we  follow  the  following  pro¬ 
cedure: 

•  if  B  is  ground,  A t, . . .  ,.4*  all  become  ground. 

•  if  Aj, . . .  ,An  are  all  ground,  then  B  becomes  ground. 

•  Otherwise,  replace  the  above  unify  goal  with  f(A|, .  .  .  ,An)  =  t(Bx,  .  .  .  ,B„) 
where  a  coupling  class  containing  B,Bt, . . .  ,Bsbn  is  added  to  Vi_l. 

To  generate  a  subgoal  entry  mode,  one  needs  the  subgoal  (say,  the  itk  one), 
and  the  previous  variable  status  triple,  V^.  The  algorithm  is  simple  and  is  given 
below  as  algorithm  6.3.  Generating  a  clause  exit  mode  is  a  special  case  of  the 
above.  Instead  of  a  subgoal,  the  clause  head  is  used,  as  is  Vn.  The  result  is  the 
clause  exit  mode. 


Algorithm  6.8  -  Generating  a  subgoal  entry  mode 

Input:  ith  subgoal  t  =  f(f,,  .  .  .  ,tn), 
previous  variable  status  Vi_l. 

Output:  entry  mode  M  = 

Algorithm: 
for  j  =  1  to  n 

if  tj  contains  no  independent  or  coupled  variables 
then  my  =  g 

else  if  tj  contains  no  coupled  variables  and  any 
independent  variable  in  tj  appears  in  no  other 
subterm  of  t 
then  my  =  i 

else  if  any  coupled  variables  in  tj  appear  in  no  other 
subterm  of  t 
then  my  =  i 

/*  because  the  entered  clause  will  only  “see”  the  variable(s) 
once  and  they  will  seem  independent  */ 
else  j*  linked  to  other  subterms  */ 
if  tj  contains  a  variable  which  appears  in  a  previous 
subterm  tk  or  contains  a  variable  coupled  to  a 
variable  appearing  in  a  previous  subterm  tk 
(previous:  tk  where  1  <k<j) 
then  my=c<  where  mk  is  Cj 
else  my=Cj  for  some  new,  unique,  I. 


One  should  note  that  the  clause  entry  modes  described  here  are  equivalent  to 
the  goal  modes  described  in  chapters  3  and  4,  which  are  used  in  scheduling.  Head 
modes,  as  described  in  chapters  3  and  4,  have  no  real  equivalent  here,  although 
they  may  be  generated  using  algorithm  6.3,  where  Vn=(Gn,Jn,Cnj  such  that 
Gn=Cn  =  0  and  Jn  =  {all  variables  in  the  clause  head}.* 

*  One  SDDA  issue  which  is  irrelevant  to  unification  scheduling,  but  is  important  to 
AND-parallelism  and  backtracking  is  the  notion  of  the  generator  of  a  variable,  which  is 
used  in  generating  a  dependency  graph  for  the  clause.  Chang  describes  the  generator  of  a 
variable  as  any  previous  subgoal  in  the  clause  that  "contributes  to  the  binding  of  the  vari¬ 
able.”  Aside  from  questions  about  the  precise  meaning  of  "contributing  to  the  binding  of 
a  variable”  (e.g.,  does  adding  another  variable  to  X's  coupling  class  “contribute”  to  X’s 
binding?),  the  non-enhanced  SDDA  proposed  by  Chang  cannot  handle  even  simple,  obvi¬ 
ous  cases  of  “contributing  to  binding.”  For  example,  in  the  subgoal 
....  Z  -  g(X,Y),  f(Z), ... 

where  X,  Y,  and  Z  are  all  independent  before  the  call,  the  entry  mode  of  the  call  will  sim¬ 
ply  be  entry (f,l,(i)),  since  structure  elements  are  not  distinguished.  If  f  contains  the  single 
danse 

n«(».j). 

X  will  have  been  bound,  but  the  exit  mode  of  the  call  will  still  be  exit(f,l,(i)),  and  Z  will 
stUl  be  independent  since  Y  is  independent.  Also,  there  is  no  evidence  of  X  having  been 
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